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PREFACE 


This  publication  contains  the  papers  presented  at  the  Symposium  on  Statistical  Hydrol- 
ogy held  at  Tucson,  Ariz.,  August  31  through  September  2,  1971.  The  Symposium  was  or- 
ganized and  sponsored  by  the  International  Association  for  Statistics  in  Physical  Science 
(lASPS)  and  the  U.S.  Department  of  Agriculture  (USDA)  in  cooperation  with  the  University 
of  Arizona. 

The  symposium  was  planned  to  generate  cooperation  and  facilitate  discussions  between 
hydrologists  and  statisticians  on  problems  in  hydrology  and  on  the  application  of  probabilistic 
and  statistical  methodologies  to  these  problems.  General  topics  covered  included  preparation 
of  stochastic  models  of  rainfall,  runoff,  and  sediment;  estimation  of  extreme  values;  and  op- 
timization of  hydrologic  modehng  techniques. 

A  total  of  79  engineers,  hydrologists,  statisticians,  and  others  from  18  States  and  two 
foreign  countries  attended  the  conference.  Participants  represented  Federal  and  State  agen- 
cies, universities,  and  students.  A  list  of  those  in  attendance  is  included  in  these  proceedings. 

The  idea  for  this  symposium  was  conceived  by  Professor  J.  Neyman,  University  of 
Cahfornia  at  Berkeley.  Initial  organization  and  planning  were  developed  at  Berkeley  in 
September  1970  by  a  group  consisting  of  Professors  J.  Neyman  and  E.  Scott,  Berkeley; 
Professor  F.  N.  David,  University  of  California  at  Riverside;  Professor  P.  A.  P.  Moran, 
Australian  National  University  of  Canberra;  and  D.  L.  Brakensiek  (Beltsville,  Md.),  D.  A. 
Woolhiser  (Fort  Collins,  Colo.),  K.  G.  Renard  (Tucson,  Ariz.)  and  H.  B.  Osborn  (Tucson, 
Ariz.)  of  the  Agricultural  Research  Service,  U.S.  Department  of  Agriculture.  Professor  R.  K. 
Linsley  was  unable  to  attend  but  contributed  valuable  suggestions  to  the  organizing  com- 
mittee. A  second  planning  conference  was  held  in  January  1971  in  Tucson.  Professors  Jack 
Denny,  Chester  Kisiel,  and  Lucien  Duckstein  from  the  University  of  Arizona  contributed  to 
the  final  organization  of  the  symposium. 

Program  organization  of  the  symposium  was  handled  by  cochairman  P.  A.  P.  Moran  and 
D.  L.  Brakensiek.  Local  arrangements  were  made  by  K.  G.  Renard.  Personnel  at  the  USDA 
Southwest  Rangeland  Watershed  Research  Center  at  Tucson  assisted  in  many  of  the  details. 
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not  necessarily  represent  the  view  of  the  U.S.  Department  of  Agricuhure. 
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as  they  were  supphed  by  the  author  of  each  paper.  Itahc  numbers  in  parentheses  refer 
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PROCEEDINGS  OF  THE  SYMPOSIUM  ON  STATISTICAL 
HYDROLOGY  HELD  AT  TUCSON,  ARIZONA, 
AUGUST  31-SEPTEMBER  2,  1971 

OBJECTIVE 

The  Symposium  on  Statistical  Hydrology  has  been  developed  to  facilitate  discussions 
between  hydrologists  and  statisticians  on  problems  in  hydrology  and  the  application  of 
probabilistic  and  statistical  methodologies  to  their  investigation  and  solution.  The  usual 
obstacle  to  these  studies  is  connected  with  the  fact  that  the  epoch  of  scientific  ecyclopedists 
is  long  since  past  and  that  if  an  individual  has  spent  a  substantial  part  of  his  life  on  probability 
and  statistics,  his  familiarity  with  particular  domains  of  science  where  his  specialty  may  be 
relevant  must  only  be  superficial  at  best.  However,  even  with  this  limitation,  the  process  of 
becoming  actively  involved  in  the  study  of  a  nontrivial  natural  phenomenon  requires  from  the 
statistician  considerable  effort  to  learn  the  necessary  rudiments  of  the  situation.  The  difficulty 
is  enhanced  by  the  fact  that  each  active  discipline  develops  a  large  arsenal  of  its  own  concepts 
and  also  its  own  jargon. 

These  are  the  difficulties  of  a  statistician  entering  into  cooperation  with  an  experimental 
scientist.  Naturally,  the  latter  has  difficulties  of  his  own,  so  to  speak  in  reverse.  Frequently, 
statistical  cookbooks  are  used  to  resolve  these,  and  often  this  is  all  that  is  needed.  Also,  the 
interest  in  the  chance  mechanisms  is  so  intense  that  the  experimental  scientist  learns  the 
underlying  mathematical  disciplines  and  becomes  a  probabiUst  himself. 

To  generate  cooperation  between  statisticians  and  interested  experimental  scientists, 
a  number  of  symposia  have  been  arranged.  At  these  symposia  each  can  make  an  effort  to 
inform  the  other  in  mutual,  intelligible  terms.  To  be  really  successful,  such  meetings  must 
last  for  a  few  days  and  provide  ample  opportunity  for  informal  discussion.  This  is  the  origin 
of  the  so-called  lASPS  Satellite  Symposia. 


SOME  PROBLEMS  IN  PURE  AND  APPLIED  STOCHASTIC  HYDROLOGY 

By  V.  Klemes  ^ 


Abstract 

Since  the  aim  of  pure  hydrology  is  the  under- 
standing of  the  behavior  of  hydrologic  phenomena,  it 
is  necessary  that  stochastic  models  of  hydrologic 
processes  be  physically  founded.  This  paper  pre- 
sents several  examples  of  how  probabilistic  descrip- 
tion of  the  mechanism  of  runoff  formation  leads  to 
physically  justified  runoff  distribution  models. 

For  some  applied  purposes,  the  numerical 
values  of  parameters  seem  to  be  much  more  im- 
portant that  the  forms  of  models,  and  the  physical 
justifiability  of  a  model  seems  to  be  irrelevant. 
How^ever,  a  physically  sound  concept  of  a  model 
can,  at  least  to  some  extent,  compensate  for  the 
inaccuracy  of  parameter  estimates  resulting  from 
small  sample  sizes  of  available  empirical  data.  This 
is  illustrated  by  an  example  relating  various  runoff 
distribution  models  to  stationary  solution  of  the 
storage  equation.  Usefulness  of  the  latter  as  a  basis 
for  optimization  of  a  single  storage  reservoir  is 
challenged  and  optimization  on  regional  basis  is 
advocated. 

Introduction 

Pure  hydrology  is  concerned  with  hydrological 
processes  as  such,  should  strive  for  explanations 
of  how  things  happen  and  why  they  behave  as  they 
do,  and  its  methods  should  be  independent  of  any 
eventual  practical  use  of  the  acquired  knowledge. 
In  applied  hydrology,  on  the  other  hand,  the  major 
cern  should  be  to  know  to  what  extent  our  findings 
concern  should  be  to  know  to  what  extent  our  find- 
ings about  hydrological  processes  are  relevant  to 
the  practical  decisionmaking  process  in  water  re- 
source management,  to  what  extent  a  more  precise 
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knowledge  can  make  the  decisions  more  rational, 
the  results  more  predictable,  and  the  means  of 
achieving  them  more  economical. 

Logical  as  this  concept  seems  to  be,  it  is  far  from 
being  implemented  in  hydrology  in  general  and  in 
statistical  hydrology  in  particular.  For  example, 
the  main  emphasis  in  stochastic  analysis  of  hydro- 
logical  processes,  which  basically  is  the  domain  of 
pure  hydrology,  has  been  on  the  fitting  of  various 
preconceived  mathematical  models  to  empirical 
data  rather  than  on  arriving  at  a  proper  model  from 
the  physical  nature  of  the  process  itself.  The  em- 
pirical data  representing  a  hydrologic  event  are 
treated  as  a  collection  of  abstract  numbers  that  could 
pertain  to  anything  or  to  nothing  at  all.  Their  hydro- 
logic  flavor,  the  physical  substance  that  makes,  for 
instance,  a  precipitation  record  an  entity  entirely 
distinct  from,  say,  a  record  of  stock  market  fluctua- 
tions, is  not  reflected  in  the  analysis.  Thus  what  we 
usually  find  is  not,  in  fact,  statistical  or  stochastic 
hydrology  but  merely  an  illustration  of  statistical  and 
probabilistic  concepts  by  means  of  hydrologic  data. 
Such  approach  can  hardly  contribute  to  the  hydro- 
logical  knowledge. 

In  trying  to  improve  this  situation,  the  main  prob- 
lem is  to  find  the  ways  in  which  the  physical  features 
of  a  phenomenon  can  be  introduced  into  the  analy- 
sis. Some  examples  are  shown  in  the  first  part  of  this 
paper  where  the  analyzed  phenomenon  is  the  proba- 
bilistic distribution  of  runoff.  The  second  part  of  the 
paper  tries  to  view  the  same  problem  from  the  point 
of  applied  hydrology.  Here  the  model  of  a  hydrologic 
phenomenon  is  merely  a  tool,  not  the  objective  as  it 
is  in  the  former  case.  The  objective  is  an  engineering 
decision  and  the  tool  is  of  interest  only  insofar  as  it 
may  influence  the  decision.  If  the  result  does  not 
depend  on  the  tool,  then  the  engineer  will  naturally 
prefer  the  most  convenient  one,  however  crude  and 
inadequate  it  may  seem  to  the  toolmaker. 

In  the  example  shown,  the  tool  is  represented  by  a 
model  of  the  probabilistic  distribution  of  annual 
runoff,  the  objective  being  the  solution  of  storage 
equation. 
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The  Probabilistic  Distribution  of 
Runoff 


Given  the  usual  length  of  record  (40  years  or 
less),  it  is  hard  to  find  a  distribution  model  for 
annual  runoff  that  can  be  rejected  on  the  customary 
5-percent  level  of  significance. 

For  example,  Markovic  (75)  showed  that  the 
Gaussian  distribution  could  not  be  rejected  as  a 
^jgood  fit  for  332  out  of  446  tested  samples  of  an  aver- 
age size  of  37  years.  Obviously,  the  question  of 
!  acceptability    of   the    Gaussian    distribution  for 
fitting  to  annual  runoff  is  not  one  of  statistics  but 
one  of  physics.  Since  the  physical  nature  of  runoff 
is  such  that  it  cannot  take  on  negative  values,  the 
I  application  of  the  Gaussian  distribution  makes  no 
sense,  and  no  statistical  test,  however  favorable  its 
result  may  be,  can  reverse  this  fact. 

A  hydrologically  meaningful  model  must  incor- 
porate the  basic  physical  features  of  runoff  which 
are  (1)  the  above  mentioned  fact  that  runoff  cannot 
be  negative,  (2)  the  impossibility  to  justify  any  other 
absolute  lower  limit  of  runoff  than  zero,  and  (3)  the 
impossibility  to  set  any  other  absolute  upper  limit  to 
runoff  than  infinity.  The  last  condition  seems  un- 
realistic since  there  certainly  is  some  finite  upper 
limit  of  runoff,  at  least  in  the  sense  of  a  finite 
quantity  of  water  on  our  planet.  However,  since 
any  variable  describing  runoff  approaches  a  zero 
probability  for  values  incomparably  lower  than  such 
,  absolute  limit,  and  since  there  seems  to  be  no 
I  physical  reason  for  truncation  or  a  marked  dis- 
continuity at  any  particular  level,  the  assumption 
of  no  upper  limit  is  the  most  plausible  one. 

The  three  above  postulates  limit  the  range  of 
applicable  distribution  models  to  those  asymmetric 
types  with  the  lower  tail  bounded  by  zero  and  an 
unbounded  upper  tail.  Although  these  constraints 
reflect  the  bare  minimum  of  runoff  properties,  they 
in  fact  represent  about  the  maximum  of  the  physical 
input  that  has  so  far  found  its  way  into  the  analysis 
of  runoff  distribution. 

The  well-known  result  has  been  the  almost  uni- 
versal acceptance  of  two  distribution  models  of 
runoff— the  gamma  and  the  lognornial  types  that 
can  have  only  positive  skewness,  which  is  moreover 
rigidly  related  to  their  variance. 

The  way  of  overcoming,  or  rather  circumventing, 
these  two  difficulties  is  an  example  of  ft)rmalism  of 


the  highest  degree,  which  from  the  hydrological 
viewpoint  can  best  be  classified  as  entirely  arrogant. 

If  the  data  exhibit  a  much  larger  skewness  than 
that  of  the  model,  the  latter  is  modified  by  intro- 
ducing into  it  a  location  parameter,  it  is  a  postulate 
of  an  above-zero  lower  limit  of  runoff,  which  is 
contrary  to  physical  evidence.  Perhaps  to  emphasize 
the  disdain  for  physical  reality,  the  numerical  value 
of  this  "absolute  minimum"  is  not  estimated  from 
any  physical  indications  but  postulated  such  that  it 
optimize  an  arbitrary,  entirely  formal,  and  hydro- 
logically irrelevant  criterion.  If,  on  the  other  hand, 
the  data  exhibit  a  negative  or  a  very  low  positive 
skewness,  the  fact  is  simply  dismissed  as  a  sampling 
error.  To  put  it  briefly,  the  data  are  respected  to  the 
extent  to  which  they  obey  the  rules  that  we  have 
chosen  for  the  game. 

To  the  author's  knowledge,  the  first  to  show  an 
adequate  respect  to  the  above  three  basic  physical 
properties  of  runoff,  were  Kritskiy  and  Menkel  (12) 
who,  in  1946,  suggested  that  a  power  transforma- 
tion of  the  gamma  type  be  used  as  a  model  for  the 
distribution  of  runoff.  This  three-parameter  distri- 
bution, hereafter  also  referred  to  as  the  gamma-3 
type,  maintains  its  lower  limit  at  zero,  has  no  upper 
bound,  and  its  skewness  can  vary  independently  of 
its  variance. 

But  apparently  the  first  person  who  tried  to  arrive 
at  the  distribution  of  annual  runoff  from  the  mecha- 
nism of  its  formation  was  Kalinin  (7).  He  used  com- 
binatorics as  a  mathematical  tool,  and  his  reasoning 
was  as  follows. 

During  the  year,  there  is  certain  sequence  of 
alternations  of  the  cyclonic  and  anticyclonic 
weather  conditions  leading  to  a  sequence  of  rainy 
and  rainless  periods.  In  the  first  approximation,  one 
can  consider  only  the  average  number  of  periods 
within  the  year,  the  average  length  of  a  rainy  and  a 
rainless  period,  and  take  the  average  amount  of  an 
individual  rainfall  as  proportional  to  the  duration  of 
the  rainy  period.  The  next  simplifying  assumption 
is  that  the  occurrences  of  rainy  and  rainless  periods 
are  mutually  independent  and  have  constant  proba- 
bilities, and  I  —  1  —  s.  respectively.  In  this  crude 
scheme,  the  annual  amount  of  precipitation.  /*,  is 
given  by  the  number  of  rainy  periods,  m.  in  the  total 
number  of  "  periods  within  the  year,  and  the 
aiuuial  precipitation  totals  have  binomial  density: 
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p(P)=p„(m)  =  (  "  )s'"r-'".  (1) 

By  letting  m  be  continuous  (n  can  be  any  number 
and  so  may  be  m\  this  implies  the  possibility  of 
interpolation  between  two  different  values  of  m,  and 
thus  the  possibiUty  of  m  being  continuous)  we 
obtain  the  Pearson-III  (gamma)  distribution  (6). 
Taking  further  into  account  the  nonUnear  relation- 
ship between  precipitation  P  and  runoff  Q  by  approx- 
imating it  as  Q=aP'>,  Kalinin  concludes  that  a 
power  transformation  of  gamma  distribution,  as 
suggested  by  Kritskiy  and  Menkel  (12),  is  a  physi- 
cally justified  model  for  distribution  of  annyal 
runoff.  Kalinin  shows  that,  coefficient  of  skewness 
of  binomial  distribution  being  given  as  Cs  —  {t  —  s)l 
\^^nst,  annual  runoff  in  arid  regions  (small  values  of 
5)  should  have  large  positive  skewness,  while  runoff 
in  humid  regions  (large  values  of  s)  should  tend  to 
small  positive  and  even  negative  skewness. 

In  1970,  Klemes  (10)  proposed  a  different  phys- 
ically founded  approach  of  arriving  at  a  theoretical 
model  for  the  distribution  of  runoff,  based  on  the 
stochastic  theory  of  storage.  The  basic  philosophy 
behind  his  approach  is  this.  The  difference  between 
the  time  pattern  of  precipitation  and  that  of  result- 
ing runoff  is  due  to  the  memory  of  the  system  trans- 
forming one  into  the  other.  In  a  hydrologic  system, 
the  memory  is  physically  represented  by  a  storage 
capacity  capable  of  detaining  water  for  a  certain 
period  of  time.  The  storage  in  question  can  take  on 
many  forms;  it  may  be  represented  by  a  lake,  voids 
in  the  soil,  snow  and  ice,  channel  storage,  or  tem- 
porary detention  on  ground  surface.  However,  one 
common  characteristic  of  all  these  storage  types  is 
that  they  behave,  in  principle,  as  a  semi-infinite 
storage  reservoir,  for  in  each  case  the  minimum 
amount  of  water  stored  is  zero,  while  there  is  no 
definite  upper  limit.  The  process  that  interests  us 
takes  place  at  the  very  bottom  of  such  reservoirs, 
and  its  theory  is  relatively  well  developed  (16). 

The  particular  theoretical  result  employed  is 
due  to  Gani  (3).  It  states,  in  principle,  that  if  the 
input  into  a  semi-infinite  reservoir  can  be  approxi- 
mated by  a  stationary  additive  random  process  with 
nonnegative  increments,  and  if  its  distribution  is 
close  to  the  gamma  type,  then,  for  a  constant  res- 
ervoir draft,  storage   is   gamma-distributed  with 


concentration  at  zero  (similar  results  were  also 
obtained  by  Gani  and  Prabhu  (4)  for  a  whole  class 
of  input  distributions). 

Taking  precipitation  as  input  (with  the  above  prop- 
erties) into  a  semi-infinite  reservoir  (the  basin)  and  1 
noting  that  output  Q  from  a  reservoir  is  related  toM 
its  storage  S  by  a  power  function  (the  routing  | 
function):  ^1 

1 

Q  =  cS'^  (2) 

so  that  probability  of  zero  output  will  asymptotically 
converge  to  zero,  the  distribution  of  storage  is  likely 
to  be  close  to  the  gamma  type.  Given  equation  2, 
runoff  should  then  be  distributed  approximately  as  a 
power  transformation  of  the  gamma  type. 

The  asymmetry  of  this  distribution  will  largely 
depend  on  the  power  d.  For  d  <^  1  the  coefficient  of 
skewness,  Cg,  becomes  negative,  for  d  >  1  it  is 
close  to  that  of  input.  Relationships  between  Cs  of 
output  and  d,  for  three  different  distributions  of 
input,  are  shown  in  figure  1  (in  all  three  cases,  the 
input  process  has  been  considered  random  with 
unit  mean  and  the  coefficient  of  variation  Cy=  0.25). 

Strictly  speaking,  the  above  approach  best  reflects 
the  reality  in  cases  of  relatively  small  basins 
where  virtually  every  rain  extends  over  the  whole 
catchment  and  its  depth  has  approximately  uniform  1 
areal  distribution.  Only  in  such  cases  will  the  basin, , 
more  or  less,  physicaUy  resemble  a  single  storage 
reservoir. 

A  clue  to  the  type  of  distribution  of  annual  runoffl 
from  a  large  watershed  can  be  found  in  the  theory  ofl 
convolutions  since  such  runoff  can  be  regarded,  a1  1 
least  conceptually,  as  a  sum  of  runoffs  from  al 
number  of  small  subbasins. 

Since  runoffs  x  and  y  from  two  basins  may ' 
generally  be  correlated,  the  distribution  of  z  =  x  +  y:  i 

fziz)  =  rfxiAz-y)fy{y)dy  (3)i 

would   involve   conditional   distributions  fxiy  ofi 

The  problem  of  conditional  distributions  is  ai 
difficult  one  in  this  case  since  nonnormal  marginal! 
distributions  and  generally  nonlinear  correlation) 
between  x  and  y  are  involved. 

Again,  we  are  facing  a  problem  that  can  hardly;, 
be  resolved  by  statistics  alone  — by  fitting  some: 
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bivariate  or  multivariate  model  to  the  data  — since 
there  can  be  a  multitude  of  models  satisfying  the 
boundary  conditions  specified  by  the  marginal 
distributions  and  the  type  of  correlation.  Here  a 
more  fruitful  concept  than  to  derive  conditional 
distributions  from  a  preconceived  bivariate  model 
would  be  to  start  w^ith  conditional  distributions  as 
a  primary  notion  as  suggested  by  Feller  {1 )  and 


define  the  marginal  distribution  of  runoff  as  a 
mixture. 

Another  example  where  the  concept  of  inter- 
preting the  distribution  of  runoff  as  a  mixture  is 
physically  sound  is  the  case  where  the  variable 
does  not  represent  a  homogeneous  phenomenon. 
The  distribution  of  maximum  annual  flows,  which 
are  heterogenous  in  most  climatic  conditions,  can 


Input  distributions : 


normal 
log -normal 
gamma 


T  1  1  1  p 

0  OS 
Cg  of  oufpui 


Figure  L  — Skewness  of  reservoir  output  Q  as  function  of  power  d  in  equation  Q  =  cS''.  for  random  input  of  three  different  distribution 

types  (in  all  cases,  input  mean  .f  =  1  and  its  6%  =  0.25). 
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serve  as  an  example.  The  annual  maxima  may  be 
composed  of  flows  resulting,  for  instance,  from 
cyclonic  storms,  snowmelt,  and  hurricanes. 

A  physically  justified  treatment  would  require  a 
separate  analysis  of  the  maxima  from  years,  say, 
with  no  hurricanes  i/i )  and  those  with  hurricanes 
(B).  A  mixture  of  distributions and  would  then 
be  the  correct  model  for  the  distribution  of  all 
annual  maxima  combined  (fig.  2). 

In  figures  3  and  4,  empirical  distribution  func- 
tions of  annual  runoff  of  two  Canadian  rivers  are 
shown  (together  with  formal  fits  by  five  diff"erent 
models,  the  purpose  of  which  will  be  shown  in  the 
next  section  of  the  paper),  both  exhibiting  apparent 
discontinuities  typical  for  many  runoff  samples. 
Comparison  with  the  shape  of  distribution  function 
A+B  in  figure  2  suggests  that  a  sudden  "jump," 


being  a  typical  feature  of  the  mixture,  can  be  the 
consequence  of  a  nonhomogeneity  of  the  sample. 

However,  it  has  to  be  emphasized  that  the  concept 
of  mixed  distributions  is  meaningful  only  if  physi- 
cally justified  and  if  the  component  subsamples 
can  be  separated  on  physical  grounds.  Hydro- 
logjcally,  it  would  make  no  sense  if  the  only  reason 
for  applying  this  concept  were  to  "improve  the  fit." 
This  would  create  more  problems  than  it  could  solve 
(how  many  components  to  use,  what  distribution 
types  to  assume  for  them,  how  to  estimate  their 
parameters). 

The  last  example  of  a  physically  founded  ap- 
proach to  distribution  of  runoff  concerns  annual 
runoff  from  arctic  basins  or  those  with  perennial 
snow  and  glaciers  and  relatively  dry  summers.  In 
this    case,   the    theory   of  storage  and  that  of 


Figure  2.  —  Density  and  distribution  function  of  a  mixture  of  two  distributions. 
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Figure  3.  —  Distribution  function  of  mean  annual  flows.  Milk  River,  Alberta,  Canada  (fitted  distribution  models:  1,  power-transformed 
gamma;  2,  three-parameter  lognormal;  3,  lognormal;  4,  gamma;  5,  Gaussian). 


extreme  values  can  be  found  to  provide  appropriate 
tools  for  deriving  the  distribution  of  runoff. 

The  storage  model  now  has  the  following  physical 
interpretation.  Reservoir  input  is  represented  by 
annual  totals  of  effective  precipitation.  Reservoir 
storage  is  represented  by  the  amount  of  snow  and 
ice  in  the  basin.  The  amount  i'  at  the  beginning  of 
melting  season  is  equivalent  to  storage  of  water 
in  a  semi-infinite  reservoir  immediately  before 
release.  Its  distribution  fr(v)  for  given  reservoir 
draft  can  be  found,  in  principle,  from  the  distribu- 
tion and  time-behavior  of  input 

Reservoir  draft,  u,  is  equivalent  to  the  potential 
annual  amount  of  melt  and  its  distribution  /«(«) 
can  be  derived  from  that  of  annual  totals  of  energy 
available  for  melting. 


Reservoir  output,  w,  is  equivalent  to  annual 
runoff.  It  will  be  equal  to  the  smaller  of  the  two 
simultaneously  occurring  variates  Ui  and  lu.  Its 
density  will  be  given  as 

fuiw)  ^f,-(u-)    r  f„(u)du+f,.(u)    r  fr{v)dv 

(4) 

and  its  distribution  function  as 

Fu{u)  =  F,(u)Fu(u).  (5) 

This  shows  that  the  distribution  of  annual  runoff 
is  here  defined  as  an  extremal  distribution,  whiih 
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Figure  4. —  Distribution  function  of  mean  annual  flows.  North  Saskatchewan  River,  Saskatchewan,  Canada  (fitted  distribution  models: 
1,  power-transformed  gamma;  2,  three-parameter  iognormal;  3,  lognormal;  4,  gamma;  5,  Gaussian). 


may  be  rather  surprising  since  it  has  not  been 
customary  to  treat  annual  runoff  totals  as  extreme 
values. 

The  above-described  extremal  mechanism  indi- 
cates the  possibility  of  a  negatively  skewed  distribu- 
tion of  runoff  if  the  variance  of  potential  melt  is 
much  smaller  than  that  of  snow  and  ice  storage.  In 
such  a  case  (see  fig.  5),  the  upper  tail  of  the  runoff 
distribution,  dominated  by  potential  melt,  would 
approach  the  axis  rapidly  in  comparison  with  the 
lower  tail,  thus  leading  to  Cs  <  0. 

The  above  examples  show  some  of  the  possi- 
bilities how  the  physical  properties  of  a  phenomenon 
can  be  used  for  deriving  its  statistical  properties. 
They  also  show  how  a  physically  based  approach 
can  help  to  detect  statistical  properties,  the  exist- 


ence of  which  cannot  be  reliably  detected  by  statis- 
tical analysis  of  empirical  samples.  Thus,  for 
instance,  from  the  purely  qualitative  properties  of 
the  above  models  it  can  be  inferred  that  storage  of 
water  in  the  basin,  whether  in  liquid,  soUd,  or  gase- 
ous form,  tends  to  produce  negatively  skewed 
annual  runoff.  If  confirmed,  this  hypothesis  can  be 
of  considerable  help  in  viewing  the  significance  of 
skewness  of  empirical  samples.  Partial  confirma- 
tion seems  to  be  available  and  consists  in  the 
following. 

It  is  known  both  from  theory  and  observations 
that  water  storage  increases  the  time  dependence 
in  the  runoff  process.  It  is  therefore  natural  to  expect 
that  negative  and  low  positive  values  of  skewness 
will  be  accompanied  by  high  positive  values  of  the 
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first  serial  correlation  coefficient.  The  validity  of 
■this  thesis  seems  to  be  confirmed  by  runoff  data 

compiled  by  Yevjevich  (19).  Table  1  shows  the  above 
}  parameters  for  samples  from  eight  rivers  that  have 

at  least  40  years  of  records  (shorter  series  were 
[  excluded  to  reduce  the  chance  of  large  sampling 
I  errors)  and  exhibit  the  highest  negative  values  of 
!  skewness.  The  table  shows  that  in  all  cases  we  are 
^  concerned  with  rivers  either  having  large  reservoirs 

in  their  basins  (St.  Lawrence  River  — the  Great 
I  Lakes,  Missouri  River— underground  storage),  or 


being  fed  by  glaciers  (Spray  and  Arkansas  Rivers  — 
the  Rocky  Mountains;  Birs  River  — the  Alps,  Garone 
River  — the  Pyrenees),  or  being  in  humid  climate 
(Quinault  River  — Pacific  coast). 

Is  Physical  Justifiability  of  a  Runoff 
Model  Important  in  Applied 
Hydrology? 

A  physically  justified  model  of  the  distribution  of 
runoff  may  lead  to  a  variety  of  rather  complex  analyt- 


FlGURE  5.  —  Definition  sketch  to  distribution  of  runoff  from  snow  and  ice. 
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Table  \.  — First  serial  correlation  coefficients  for  some  negatively  skewed  series  of  annual  runoffs 


Number  of 

Coefficient 

First 

River 

Station  and  location 

years 

of 

serial 

of  records 

skewness 

correlation 
coefficient 

Spray   Banff,  Alberta  

Arkansas   Salida,  Colo  

Beaver   Beaver.  Utah  

St.  Lawrence   Ogdensburg  (U.S. A. -Canada) 

Missouri   Sioux  City,  Iowa  

Quinault    Quinault  Lake,  Wash  

Garonne   Mas  d'Argenais,  France  

Birs   Muenchenstein,  Switzerland. 


45 

-  0.942 

0.438 

47 

-.407 

.377 

43 

-.360 

.483 

97 

-.286 

.705 

58 

-.187 

.590 

46 

-.176 

.189 

42 

-.156 

.738 

41 

-.156 

.168 

Compiled  from  (19). 


ical  forms.  The  situation  is  likely  to  be  similar  with 
models  of  the  time  behavior  of  runoff  as  indicated, 
for  instance  by  Hurst  (5),  Mandelbrot  and  Wallis  (14), 
and  Scheidegger  (77). 

The  question  arises  about  the  virtue  of  using 
physically  justified  but  mathematically  complicated 
models  of  hydrologic  processes  in  engineering 
design,  instead  of  simple  models  chosen  on  the 
basis  of  their  formal  "good  fit"  of  the  observed 
phenomenon. 

Let  us  illustrate  this  problem  by  examining  the 
impact  of  various  models  of  the  distribution  of  annual 
runoff  on  the  stationary  solution  of  storage  equation 
in  the  form: 

R=f{S,D)  (6) 

where  S  is  reservoir  storage,  D  is  reservoir  draft, 
and  R  is  the  stationary  value  of  reliability  (comple- 
mentary variable  to  risk  of  failure). 

The  above  form  of  storage  equation  (generally, 
any  two  of  the  variables  R,S,  D  can  be  chosen  as  the 
independent  ones)  has  been  chosen  because  of  its 
suitability  for  testing  the  stability  and  dependability 
of  optimum  solutions  of  storage  problems. 

Consider  a  specific  type  of  water  use  with  a  given 
ideal  value  of  water  demand  (for  instance,  for  a  given 
acreage  of  land  to  be  irrigated  or  for  a  community 
with  a  given  number  of  residents)  and  a  complete 
set  of  economic  parameters,  and  assume  that  the 
water  is  to  be  supplied  from  a  reservoir.  The 
problem  then  will  be  to  find  an  optimum  storage 
capacity  of  this  reservoir.  This  can  be  done  by  simu- 


lating the  system  performance  for  a  set  of  storage 
capacities  over  a  sufficiently  long  period  of  time  and 
by  maximizing  the  net  economic  benefits  of  the 
system.  Having  the  optimum  storage  capacity  S  for 
the  given  demand,  or  draft,  D,  we  can  find  the  corre- 
sponding reliability,  which  can  be  regarded  as  the 
"optimum  reliability."  ^ 

After  the  completion  of  the  project,  the  "relia- 
bility" is  in  fact  the  only  noneconomic  parameter 
that  can  be  used  for  evaluating  the  actual  perform- 
ance of  the  system  with  respect  to  the  theoretical 
optimum  performance.  Through  the  difference 
between  the  optimum  and  the  actual  reliability  we 
can  then  judge  the  adequacy  of  the  runoff  model  used 
in  the  optimization  procedure.  By  the  same  token, 
the  difference  in  reliability  resulting  from  using  two 
different  runoff  models  can  serve  for  the  testing  of 
sensitivity  to  model  types,  and  stability,  of  an 
"optimum"  solution  of  the  storage  equation  since 
the  variables  in  equation  6  can  be  regarded  as  those 
pertaining  to  an  economically  based  optimum. 

In  the  present  example,  40-year  samples  of  mean 
annual  flows  from  13  Canadian  rivers  were  used. 
Each  sample  was  normalized  and  fitted  with  a  power 
transformation  of  the  gamma  distribution,  which 
fit  was  considered  the  basic  one.  Along  with  it, 
a  normal,  gamma,  lognormal,  and  a  three-parameter; 


-  It  has  been  observed  that  stationary  reliabilities  of  econom- 
ically optimal  solutions  of  reservoirs  serving  the  same  purpose 
tend  to  jiroup  within  relatively  narrow  bounds  (2,  II,  13).  This 
has  led  to  the  concept  of  "optimum  reliability,"  a  design  param- 
eter, or  a  standard,  that  can  be  used  whenever  detailed  opti- 
mization analysis  is  not  possible. 
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lognormal  (one  with  location  parameter)  model  was 
also  fitted  to  each  sample;  two  examples  of  the  five 
fits  are  shown  in  figures  3  and  4.  Finally,  all  the  fitted 
distributions  were  standardized  to  C,  =  0.3  to  avoid 
the  influence  of  the  variance.  To  eliminate  the  in- 
fluence of  different  time-behavior  of  the  13  series, 
ail  of  them  were  regarded  as  random. 

The  gamma-3  fit  was  then  used  in  each  particular 
case  to  calculate  storage  capacities  required  for 
draft  D  varying  from  0.5  to  0.95  of  the  mean  and  for 
reliability  R  from  80  to  98  percent.  For  each  partic- 
ular river,  this  set  of  storage  values,  together  with 
the  above  set  of  drafts,  was  then  used  to  calculate 
back  reliabilities  based  on  the  other  four  distri- 
bution models.  To  underline  the  point  of  the  whole 
test  (the  difference  between  a  hydrologically  best 
model  and  models  as  simple  as  possible),  the  gamma- 
3  distribution  was  fitted  by  minimizing  its  while 


the  other  four  models  were  fitted  by  the  method  of 
moments  (nevertheless,  all  the  fits  were  statisti- 
cally acceptable  in  the  sense  that  they  could  not  be 
rejected  on  the  5-percent  significance  level  based  on 
distribution  of  x^). 

The  differences  between  the  original  reliability 
(based  on  the  gamma-3  type)  and  those  obtained  from 
the  four  other  models  are  shown  in  figure  6  from 
which  two  important  conclusions  can  be  drawn. 

First  of  all.  it  is  apparent  that  the  types  of  distri- 
bution models  fitted  to  the  sample  do  not  cause 
appreciable  differences  in  reliability  as  long  as  their 
coefficients  of  asymmetry,  Cj,  are  close  to  that  of 
the  reference  model.  (Here  approximately  the 
same  as  the  empirical  values  listed  in  fig.  6.)  Thus 
we  see,  for  instance,  that  the  three-parameter  log- 
normal  fit  yields  almost  the  same  results  as  the 
gamma-3  type  in  spite  of  the  fact  that  the  responsive- 
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Figu  re  6.  —  Differences  in  reliability  resulting  from  using  the  given  distribution  model  instead  of  the  power-transformation  of  the  gamma 
type.  The  smallest  c  ircles  indicate  differences  up  to  0.5  percent;  the  largest  circles  differences  of  2  percent  and  more.  Open  circles 
indicate  negative  differences,  and  closed  circles  positive  differences. 
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ness  of  the  former  to  Cs  has  been  achieved  by  a 
physically  unjustified  means  of  including  into  it  a 
location  parameter.  Moreover,  in  cases  of  negative 
skewness  the  lower  tail  of  the  three-parameter 
lognormal  type  reaches  below  aero  which  is  not 
only  unjustified  but  totally  unrealistic,  and  yet  the 
results  (except  for  the  most  extreme  case)  are  prac- 
tically indistinguishable  from  those  based  on  the 
physically  justified  gamma-3  type.  The  same  pattern 
can  be  observed  throughout  the  whole  spectrum. 
Thus  the  two  asymmetric  two-parameter  models 
show  good  agreement  around  their  respective 
values  of  Cs  —  SCp  +  C^  and  Cg  —  2Cv,  whereas  the 
normal  fit  — although  physically  unrealistic  — gives 
good  results  for  Cg  anywhere  between  —1  and  1. 

The  second  conclusion  can  be  formulated  as 
follows.  Because  of  the  importance  of  the  coeffici- 
ent of  asymmetry,  the  two-parameter  models  can 
introduce  substantial  errors  into  the  solution 
of  storage  equation  because  of  their  rigid  skewness, 
mainly  for  drafts  lower  than  90  percent  of  mean  flow 
and  for  reliabiHties  above  90  percent.  In  this  range, 
the  error  in  reUabiUty  can  easily  exceed  ±  2  percent, 
which  is  equivalent  to  errors  in  long-term  storage 
from  ±  15  to  ±  50  percent  and  more.  The  relation- 
ship between  a  2  percent  error  in  reliabihty  and  the 
corresponding  errors  in  storage  is  shown  (for  gamma 
distribution  with  C„  =  0.3)  in  figure  7,  where  the 
shaded  area  shows  the  region  where  2  percent 
rehabihty  errors  can  easily  be  exceeded. 

The  overall  picture  seems  to  be  rather  grim. 

On  the  one  hand,  it  is  evident  that  the  conceptual 
correctness  and  physical  justifiabihty  of  runoff 
distribution  model  has  no  appreciable  influence  on 
numerical  solution  of  the  storage  equation,  because 
all  the  theoretically  important  conceptual  differ- 
ences are  completely  overshadowed  by  the  numer- 
ical value  of  a  single  parameter. 

On  the  other  hand,  the  impossibihty  of  rehably 
estimating  the  true  value  of  this  parameter  makes 
the  optimization  of  a  storage  reservoir  an  exercise  in 
futility,  because  to  determine  the  optimum  size  of  a 
reservoir  with  an  error  of  ±  50  percent  or  more  is  not 
much  of  an  achievement. 

Thus  from  the  "appUed"  point  of  view  it  seems 
more  fruitful  to  aim  at  an  improvement  of  the  esti- 
mates of  population  parameters  (here  the  impor- 
tance of  an  efficient  estimation  method  comes  into 
the  picture)  rather  than  of  the  theoretical  form  of 
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Figure  7.  —  Error  in  storage  capacity  corresponding  to  a  2  percent 
error  in  reliability  (random  gamma  input,  Cv  =  0.3). 


distribution  models.  However,  following  this  course 
we  soon  learn  to  appreciate  the  importance  of  these 
theoretical  models. 

Short  of  large  samples  of  data,  the  statistician 
washes  his  hands  of  responsibility  in  evaluating 
alternatives,  considering  most  differences  insignifi- 
cant in  the  light  of  the  actual  information  content  of 
the  data. 

But  the  problem  is  that  a  statistically  insignifi- 
cant difference  in  a  parameter,  say  the  coefficient 
of  skewness,  may  lead  to  a  difference  of  the  order  of 
millions  of  doUars  in  reservoir  cost,  which  no  engi- 
neer can  treat  as  an  insignificant  one. 

Not  being  able  to  base  his  decision  for  a  particular 
alternative  on  statistical  evidence,  the  engineer  has 
no  choice  but  to  select  one  which  is  most  plausible 
from  the  hydro  logical  (physical)  viewpoint,  even  if 
the  hydrologic  information  is  only  of  a  qualitative 
nature.  Here  we  come  to  the  point  where  the  physi- 
cal justifiability  of  a  particular  concept  may  play  a 
significant  role.  For  instance,  results  similar  to 
those  discussed  under  "The  Probabilistic  Distri- 
bution of  RunofT'  would  enable  him  to  see  in  which 
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cases  caution  is  needed  in  ignoring  the  negative 
value  of  empirical  skewness,  even  if  statistical  tests 
suggest  that  it  may  be  just  a  result  of  a  sampling 
error. 

Thus  we  find  after  all  that,  quite  unexpectedly,  the 
sophisticated  but  physically  justified  concepts  may 
have  some  virtue  not  only  as  a  "thing  in  itself  but 
also  in  a  very  real  applied  sense. 

There  remains,  however,  one  more  point  to 
examine.  The  above  optimism  rests  on  the  assump- 
tion that  the  stationary  solution  of  storage  equation 
is  an  adequate  basis  for  reservoir  design. 

In  the  author's  opinion  (8),  optimization  of  a  single 
reservoir  based  on  a  stationary  solution  of  the 
storage  equation  is  of  little  practical  value.  In  order 
that  the  optimum  performance  materialize,  the  reser- 
voir would  have  to  operate  for  100  years  or  more 
under  the  originally  assumed  technological  and 
economic  conditions,  which  is  highly  unrealistic. 
Moreover,  it  certainly  makes  a  great  difference 
if,  say  the  10  failure  years  expected  during  a  100- 
year  period  of  operation  at  a  90-percent  relia- 
bility would  occur  during  the  first  20  years  or 
towards  the  end  of  the  period.  In  the  first  case,  the 
project  could  economically  collapse  while  any  num- 
ber of  failure  years  a  100  years  from  now  is  com- 
pletely irrelevant  because  by  then  the  present  criteria 
of  optimality  will  no  longer  hold,  the  reservoir  may 
serve  a  completely  different  purpose  or  may 
not  even  exist  any  more,  and  because  the  economic 
unity  between  losses  at  one  time  and  their  compensa- 
tion by  gains  at  some  other  time  disintegrates  with 
time  (for  example  there  is  probably  no  instance  in 
which  the  prosperity  of  the  sixties  could  economi- 
cally be  regarded  as  a  compensation  for  the  crisis  in 
the  thirties;  the  crisis  has  "healed"  much  sooner 
and  the  present  economy  is  independent  of  it  — 
although  there  is  only  a  40-year  timelag  involved). 

If  the  reservoir-design  problem  is  approached 
from  the  viewpoint  of  a  relatively  short  economic 
time  horizon,  say  20  to  50  years,  the  picture  changes 
considerably.  In  such  a  case,  even  if  we  can  pre- 
cisely describe  the  runoff  process,  both  in  terms  of 
its  qualitative  properties  and  quantitative  param- 
eters, its  behavior  during  the  short  period  in  ques- 
tion will  necessarily  exhibit  considerable  departures 
from  the  long-term  properties  and  cause  substantial 
deviations  from  optimum  performance  characterized 
by  the  stationary  value  of  reliabiUty.  Instead  of  this 


single  value  we  will  now  have  a  distribution  of  relia- 
bilities whose  dispersion  about  the  stationary  value 
will  be  the  greater  the  shorter  the  time  period. 

An  example  of  distribution  of  reliability  for  various 
lengths  of  periods  is  shown  in  figure  8.  In  this  case, 
taken  from  (9),  the  runoff  process  is  considered 
random  and  having  a  two-parameter  gamma  distri- 
bution with  Cf  =  0.3,  reservoir  draft  D  =  0.90,  and 
stationary  reliability  R  —  90  percent.  All  the  distri- 
bution functions  shown  relate  to  reservoir  operation 
starting  with  fuU  reservoir.  It  can  be  seen  that  in 
spite  of  the  presumed  perfect  knowledge  of  the  input 
(runoff)  process,  the  uncertainty  about  the  actual 
performance  of  the  reservoir  is  considerably  high. 
For  instance,  there  is  about  a  25  percent  chance  that 
during  a  period  of  20  years  the  actual  performance 
reliability  will  drop  by  more  than  5  percent  below  the 
stationary  (design)  level,  which  is  equivalent  to 
saying  that  to  achieve  the  optimum  performance  a 
storage  capacity  43  percent  larger  than  the  design 
value^  would  be  needed  with  odds  1  :  3. 


■'  Svanidze's  nomographs  (18)  were  used  for  the  conversion  of 
reliability  into  storage. 
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With  such  an  instability,  the  stationary  "optimum" 
solution  loses  most  of  its  practical  value  since  the 
reliance  on  it  would  mean,  contrary  to  the  whole 
purport  of  optimization,  a  very  risky  venture.  And 
yet,  the  degree  of  risk  of  the  above  order  must  be 
regarded  as  the  minimum  representing  the  most 
optimistic  limit  that  can  ever  be  hoped  for.  For  in 
reality  the  underlying  model  of  runoff  will  never  be 
perfect  and,  even  if  it  were,  its  parameters  will 
never  be  completely  free  of  errors.  A  picture  of 
reservoir  performance,  truly  realistic  in  the  sense 
that  it  does  not  conceal  the  uncertainties  and  imper- 
fections in  the  underlying  hydrological  knowledge, 
would  be  represented  by  a  distribution  of  reliability 
based  on  a  distribution  of  plausible  runoff  models 
and  on  distributions  of  their  parameters,  and  would 
naturally  have  greater  variance  than  that  in  the 
above  example. 

Thus,  abandoning  the  illusion  that  the  stationary 
solution  of  storage  equation  is  a  realistic  basis  for 
the  design  of  a  reservoir,  we  see  that  neither  a  cor- 
rect distribution  type  nor  a  correct  estimate  of  runoff 
parameters  can  remove  much  of  the  uncertainty  in, 
and  the  instability  of,  an  "optimum"  solution  of  the 
storage  problem. 

The  problem  then  arises  as  to  whether  the  picture 
of  reservoir  performance  that  can  be  obtained  on  the 
basis  of  stochastic  interpretation  of  the  runoff 
process  has  any  practical  usefulness  at  all. 

In  the  author's  opinion,  the  answer  is  yes  but  the 
usefulness  lies  in  other  directions  than  where  it  is 
being  sought. 

The  traditional  concept  of  an  optimum  solution 
for  a  single  reservoir  collapses,  as  it  appears,  either 
for  economic  or  for  hydrologic  reasons. 

A  long-term  optimization  that  could  possibly  make 
advantage  of  stationary  behavior  of  the  runoff 
process  fails  because  of  the  nonstationary  (and 
unpredictable)  behavior  of  the  economic  and  social 
element. 

A  short-term  optimization  that  could  rely  on  rela- 
tively constant  (or  at  least  predictable)  socioeconomic 
conditions  fails  because  of  the  unpredictability  of 
the  properties  of  a  single  short  realization  of  the 
runoff  process. 

The  stochastic  theory  and  the  statistical  proper- 
ties of  runoff  can  be  found  to  have  practical  value 
only  if  applied  in  a  statistically  meaningful  sense:  to 
describe  a  multitude.  Since  the  multitude  repre- 


sented by  the  infinite  time  of  a  single  realization  and 
leading  to  stationary  solutions  is  excluded  on  eco- 
nomic grounds,  the  only  other  direction  leads 
towards  the  multitude  of  realizations  within  a  finite 
time,  in  other  words  towards  a  large  number  of 
reservoirs  operating  within  the  same  period  of  time. 

In  such  a  case,  the  individual  stationary  optima 
could  be  regarded  as  ensemble  averages,  and  the 
distribution  of  their  descriptors,  say  a  reliability 
parameter,  would  describe  a  real  situation  over  a 
large  territory  rather  than  a  merely  theoretical 
case  based  on  hypothetical  replicas  of  a  nonrepli- 
cable  situation,  as  it  is  in  figure  8. 

Territorial  optimization  could  be  materialized 
through  an  insurance  policy  which  would  compen- 
sate a  water  consumer  for  losses  exceeding  his 
target  risk  of  failure. 

Conclusions 

Contrary  to  deterministic  hydrology  whose  mis- 
sion is  to  extract  the  causative  skeleton  of  a  hydro- 
logic  process,  the  mission  of  statistical  and  stochas- 
tic hydrology  is  to  objectively  describe  its  inherent 
uncertainties.  Very  often  this  is  not  appreciated 
enough  and  stochastic  models  are  used  with  the 
intention  of  providing  deterministic  answers.  This 
is  wrong  in  principle.  No  stochastic  model  can  "fill 
in"  the  missing  data  in  a  historical  series,  no  stochas- 
tic model  can  guarantee  that  the  optimum  storage 
capacity  that  it  yields  is  the  true  optimum  which  one 
would  arrive  at  if  using  the  flow  sequence  that  will 
actually  materialize  in  the  future. 

Nevertheless,  this  is  how  stochastic  models  are 
often  being  applied,  and  the  deterministic  misuse 
of  their  answers  is  justified  by  saying  that  "the  engi- 
neer needs  some  concrete  answer  because  he  has 
to  come  up  with  a  concrete  solution."  There  is 
only  one  reply  to  the  argument:  If  there  exists  an 
inherent  uncertainty  then  any  "concrete"  answer  is 
potentially  wrong.  The  real  problem  for  the  engineer 
is,  therefore,  not  to  get  "a  concrete  answer"  but 
to  learn  how  to  incorporate  the  unremovable  un- 
certainty into  his  solution. 

Nowadays  in  hydrology,  and  the  more  so  in 
engineering,  uncertainty  is  still  regarded  as  a  regret- 
table imperfection  in  the  body  of  knowledge,  as  it 
was  in  19th-century  physics.  As  in  physics,  it  also 
seems  that  in  hydrology  and  engineering  (which  are. 
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after  all,  only  offsprings  of  physics)  progress  lies  not 
in  trying  to  remove  the  uncertainty  at  any  cost  but 
in  learning  how  to  make  it  one  of  the  legitimate  ele- 
ments of  our  concepts.  After  having  learned  that, 
we  can  discard  the  misnomers  "deterministic 
hydrology"  and  "statistical  (and  stochastic)  hy- 
drology" and  can  come  back  to  hydrology. 
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APPLICATION  OF  SENSITIVITY  ANALYSIS  TO  RESERVOIR  DESIGN  AND 

WORTH  OF  STREAMFLOW  DATA 

ByM.  E.  Moss  andD.  R.  Dawdy^ 


Abstract 

The  effects  of  four  nonhydrometric  variables 
upon  the  expected  net  benefit  of  a  multipurpose 
reservoir  and  upon  the  worth  of  streamflow  data 
used  in  the  design  of  the  reservoir  have  been 
investigated  by  sensitivity  analysis.  The  four  vari- 
ables are:  cyclicity  in  water  demand,  flood-pool- 
operation  policy,  errors  in  evaporation  data,  and 
methodology  used  in  designing  the  reservoir. 
Valuation  of  the  level  of  information  available  with 
no  streamflow  record  at  the  reservoir  site  was  in- 
cluded as  an  inherent  step  in  the  analysis.  This 
valuation  was  performed  by  means  of  a  stochastic 
streamflow  sequence  for  which  monthly  statistics 
were  estimated  from  multiple  regression  models. 
Stochastic  sequences  of  streamflow  were  used  to 
minimize  the  complications  introduced  by  sampling 
errors  if  actual  sequences  had  been  used.  Analyses 
of  the  sensitivity  of  worth  of  data  were  used  to  define 
those  parameters  derived  from  the  above  variables 
that  must  be  taken  into  account  in  relations  that 
generalize  worth  of  streamflow  data. 

Introduction 

The  worth  of  streamflow  data  is  directly  related  to 
the  uses  to  which  it  is  put  and  to  the  pertinent  infor- 
mation which  it  contains.  In  a  less  direct  manner, 
worth  can  be  influenced  by  externalities  such  as 
political  and  social  constraints,  variability  in  other 
data  used  in  conjunction  with  the  streamflow  data, 
and  the  methodology  by  which  the  data  are  used, 
among  others.  The  degree  to  which  the  data  worth 
is  affected  by  each  of  these  factors  can  be  investi- 
gated separately  by  making  use  of  sensitivity  analy- 
sis. Sensitivity  analysis  isolates  the  effects  of  a 
particular  variable  by  maintaing  all  other  variables 
at  constant  values,  while  the  investigated  variable 
is  permitted  to  assume  various  values  of  interest. 
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The  sensitivity  of  the  output  — in  this  case  the  worth 
of  streamflow  data  — is  measured  by  the  change  in  its 
magnitude  caused  by  a  change  in  the  variable 
being  investigated. 

Data  can  often  be  evaluated  by  attributing  to  it 
the  increase  in  net  benefit  resulting  from  a  decision 
formulated  with  the  data  as  a  basis  over  that  which 
would  be  expected  from  the  decision  made  without 
reference  to  the  data.  In  the  case  of  a  streamflow 
record  used  in  the  design  of  a  reservoir,  the  dif- 
ference in  the  expected  net  benefits  derived  from 
the  reservoir  based  on  two  designs,  one  incorpora- 
ting the  record  and  the  other  omitting  it,  would  be  an 
accurate  measure  of  its  worth.  Dawdy  and  others  (2) 
have  developed  a  technique  based  on  this  concept 
by  which  the  marginal  worth  of  streamflow  records, 
which  is  the  increase  of  worth  caused  by  an  addi- 
tional year  of  record,  can  be  computed.  By  an  empir- 
fitting  of  the  data  from  Dawdy  and  others  (2);  Moss 
(7)  proposed  a  relation  for  the  worth  of  data  in  reser- 
voir design  as  a  function  of  the  record  length.  In 
order  to  generalize  the  relation,  it  became  necessary 
to  determine  the  effects  of  external  variables,  as  well 
as  the  streamflow  parameters,  on  the  worth  of  data. 
Sensitivity  analysis  is  weU  suited  for  this  task. 

This  paper  describes  the  sensitivity  of  the  design 
of  the  conservation  pool  of  a  multipurpose  reservoir 
and  the  associated  worth  of  streamflow  data  to  four 
variable  factors:  (1)  the  design  procedure  used  to 
define  the  reservoir  size,  (2)  monthly  evaporation 
rate,  (3)  the  phase  angle  between  the  cyclicities  of 
streamflow  and  demand,  and  (4)  the  operation 
policy  imposed  on  the  flood  pool.  A  case  study  of  a 
multipurpose  reservoir  design  for  the  Tygart  Valley 
River  near  Belington  W.  Va.,  is  used  as  the  basis  of 
these  analyses.  The  optimum  reservoir  size  is  predi- 
cated on  a  synthetic  monthly  streamflow  record  of 
500  years  whose  statistics  closely  match  the  sample 
statistics  of  the  60-year  streamflow  record  collected 
at  the  above-mentioned  site  by  the  U.S.  Geological 
Survey  (5).  The  monthly  means  range  between  a 
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high  of  1,570  c.f.s.  in  March  and  a  low  of  187  c.f.s. 
in  September:  the  monthly  standard  deviations 
range  between  662  c.f.s.  in  March  and  198  c.f.s.  in 
September.  The  streamflow  population  of  each 
month  was  assumed  to  follow  the  Log-Pearson  Type 
III  probability  distribution.  The  record  was  synthe- 
sized with  the  U.S.  Army  Corps  of  Engineers'  com- 
puter program  "Monthly  Streamflow  Synthesis" 
(9).  This  same  synthetic  record  is  used  in  each 
analysis  to  eUminate  the  effects  of  samphng  errors 
from  the  results.  Although  the  input  statistics  of  the 
synthesis  are  only  sample  statistics,  they  have  been 
accepted  as  population  statistics  for  the  purpose  of 
the  study,  which  is  to  investigate  the  effects  of  non- 
hydrometric  variables. 

The  optimal  reservoir  size,  which  is  the  sum  of  a 
fixed  flood  pool,  a  fixed  dead  storage,  and  an  optimal 
conservation  pool,  was  determined  by  a  reservoir 
routing  program  {10).  The  optimal  conservation  pool 
is  that  volume  which  yields  a  prespecified  value  of 
a  shortage  index  which  is  defined  as  follows: 

i  =  '-^j^^sm-.  (1) 

i=l 

where  is  the  number  of  years  in  the  streamflow 
record:  Si  is  the  amount  of  demand  that  is  not  sup- 
plied in  the  year  i:  and  D  is  the  annual  demand. 
Inclusion  of  the  shortage  index  in  the  program  per- 
mits the  occurrence  of  shortages  whose  aggregate 
economic  effect  is  less  than  the  added  cost  of  a 
reservoir  designed  to  prevent  them.  The  optimal 
value  of  optimal  /  is  assumed  to  equal  0.25.  Sensi- 
tivity of  design  and  data  worth  to  this  assumption 
is  investigated  as  part  of  this  study. 

Input  to  the  program  in  addition  to  the  streamflow 
record  consists  of  net  evaporation  rate  data,  water 
demand  data,  and  flood  pool  availability  data. 
Flood  pool  availability  refers  to  that  portion  of 
the  maximum  required  flood  pool  that  can  be  used 
for   conservation    storage    during  nonflood-prone 


periods,  and  net  evaporation  rate  is  the  difference 
between  average  potential  evapjoration  and  precipi- 
tation. The  flexibiUty  of  the  program  with  respect  to 
these  variables  permits  a  certain  degree  of  reaUsm 
to  be  incorporated  into  the  design.  Realistic  values 
for  each  of  these  data  sets,  which  are  monthly 
values  that  are  recycled  for  each  year  of  streamflow 
data  that  is  routed  by  the  program,  are  given  in  table 
1. 

For  the  conditions  stated  above,  the  optimum 
reservoir  size  was  determined  to  be  102,100  acre- 
feet  of  which  42.100  acre-feet  were  the  design  con- 
servation pool.  Cost  and  benefit  procedures  out- 
lined by  Dawdy  and  others  (2)  result  in  a  cost  of 
$16,800,000  for  this  design.  If  water  is  assumed  to 
have  a  value  of  $50  per  acre-foot  and  a  discount 
rate  of  5  percent  is  applicable,  the  expected  value  of 
the  present  worth  of  the  benefits  for  the  conserva- 
tion uses  over  the  100-year  life  of  the  project  is 
$230,000,000.  The  expected  net  benefit  of  the  proj- 
ect is  $213,200,000  plus  the  present  worth  of  the 
flood  pool  benefits.  Flood  pool  benefits  are  assumed 
to  be  unaffected  by  the  external  variables  that  are  to 
be  examined,  and  are  therefore  ignored  in  the  follow- 
ing analyses. 

Information  Base  for  Reservoir 
Desifjn 

Streamflow  records  at  a  particular  dam  site  are 
not  the  sole  method  of  obtaining  hydrometric  in- 
formation for  the  design  of  the  reservoir. 

01.  Rainfall-runoff  models  (.i)  can  be  used  to 
extract  useful  information  about  stream- 
flow  from  precipitation  records. 

02.  The  correlation  structure  between  two  of 
more  streamflow  sites  permits  transfer  of 
information  from  station  to  station  {4). 

03.  Regression  models  using  basin  character- 
istics as  independent  variables  can  pro- 
vide   information    about    the  statistical 


Table  I.— Monthly  values  of  variables  for  optimal  design 


Variable 

Oct. 

Nov. 

Dec. 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Auj;. 

Sept. 

Mean 

Net  evaporation,  in  inches  

-  1 

-  1 

-  1 

-3 

-3 

-2 

-2 

-3 

-2 

-2 

-  1 

0 

-  1.75 

Demand,  in  ft.^/sec  

330 

265 

220 

185 

220 

265 

330 

395 

410 

425 

410 

395 

'320 

Flood  pool  availability,  in  acre-feet  X  10~'  

8 

5 

0 

0 

0 

0 

10 

20 

20 

17 

14 

11 

8.75 

'  Averafie  demand  is  approximately  equal  to  50  percent  of  the  annual  flow  of  the  Typart  River  at  Belinfiton.  Va. 
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characteristics  of  streamflow  at  a  partic- 
ular site  (8,  1). 
Because  use  of  one  or  more  of  these  methods  to 
obtain  design  data  would  result  in  a  design  whose 
expected  return  or  net  benefit  would  be  greater 
than  a  design  based  on  no  data,  the  value  of  per- 
fect knowledge,  which  would  be  exemplified  by  a 
streamflow  record  of  extremely  long  timespan, 
would  be  limited  to  the  difference  between  the 
expected  net  benefit  from  the  "perfect-knowledge" 
design  and  the  "current-knowledge"  design.  In 
order  to  estimate  this  limiting  value  and  investi- 
gate its  sensitivities,  a  second  synthetic  record  of 
streamflow,  which  is  representative  of  the  state  of 
"current-knowledge,"  was  generated.  The  gener- 
ating model  was  the  same  as  was  used  for  the  original 
record,  except  that  the  input  parameters  of  monthly 
means  and  standard  deviations  were  estimated  by 
regression  analyses  (5).  Because  regression  esti- 
mates of  the  skewness  of  the  distributions  were 
not  available,  the  logarithms  of  monthly  discharge 
were  assumed  to  be  symmetrically  distributed  about 
the  mean  logarithm  for  each  month.  This  assumption 
results  in  lognormal  distributions  for  the  second 
record  as  opposed  to  Log-Pearson  Type  III  for  the 
original  record.  The  regression  estimates  of  monthly 
mean  discharges  and  standard  deviations  were  con- 
verted to  estimates  of  the  mean  logarithms  and 
standard  deviations  of  logarithms  by  the  equations 
presented  by  Matalas  (6).  Lag-one  autocorrelation 
coefficients  for  the  logarithms  of  monthly  flows 
were  estimated  by  averaging  the  sample  autocor- 
relation coefficients  for  existing  streamflow  records 
in  the  vicinity  of  the  site  at  the  Tygart  Valley  River 
near  Belington. 

Use  of  the  "current-knowledge"  record  in  con- 
junction with  the  external  conditions  specified 
in  table  1  to  design  the  reservoir  results  in  a  required 
conservation  pool  of  25,000  acre-feet  or  85,500 
acre-feet  of  total  capacity,  which  would  cost 
114,700,000.  The  expected  benefits  to  conservation 
uses  of  a  reservoir  this  size  would  be  $226,300,000 
or  an  expected  net  benefit  of  $211,600,000  if  flood- 
protection  benefits  are  ignored.  The  worth  of  a 
record  that  is  long  enough  to  yield  perfect  knowledge 
about  the  streamflow  character  at  the  reservoir 
site  would  be  hmited  to  the  difference  between  the 
optimal-design  net  benefits  and  the  "current- 
knowledge"  net  benefits  or  $1,600,000.  The  in- 


creased knowledge  would  thus  result  in  an  improve- 
ment in  design  of  less  than  1  percent  of  the  expected 
net  benefits.  The  information  base  for  West  Virginia 
that  is  contained  in  the  regression  analyses  is  there- 
fore very  high  relative  to  the  information  level 
required  for  this  reservoir  design. 

Effects  of  Evaporation  Rates 

Because  of  an  a  priori  assumption  that  both  the 
reservoir  design  and  the  worth  of  data  would  be 
rather  insensitive  to  evaporation  rate,  the  sensi- 
tivity analyses  of  this  factor  were  limited  to  two 
cases,  each  of  which  represented  large  departures 
from  the  evaporation  rates  inherent  in  table  1. 
The  first  case,  evaporation  rate  equal  to  zero  for 
each  month,  when  compared  with  the  designs  based 
on  the  reahstic  evaporation  rates,  results  in  smaller 
designs  for  the  conservation  pool  with  both  the 
optimal  record  and  the  regression-based  record. 
For  the  optimal  record  the  required  conservation 
pool  was  39,900  acre-feet  as  contrasted  with  42,100 
acre-feet  for  the  optimal  record  and  realistic 
evaporation.  Ignoring  evaporation  thus  results  in 
an  underdesign  of  the  conservation  pool  of  this 
particular  reservoir  of  5  percent  of  the  required 
volume. 

Designing  the  reservoir  with  the  regression-based 
record  results  in  a  conservation  pool  of  23,800  acre- 
feet  if  evaporation  is  ignored.  The  difference  in 
expected  net  benefits  from  the  39,900  acre-feet 
design  and  the  23,800  acre-foot  design  remains  at 
$1,600,000;  therefore,  the  worth  of  the  streamflow 
data  is  unchanged  for  this  case. 

In  the  second  case,  an  evaporation  record  was 
assumed  that  would  be  representative  of  southern 
Texas,  where  average  annual  evaporation  is  about 
twice  that  of  West  Virginia.  A  comparison  of  the 
West  Virginia  and  Texas  potential-evaporation 
records  is  shown  in  figure  1.  Doubling  the  evapora- 
tion rate  resulted  in  an  increase  of  about  6  percent 
in  the  optimal  design  of  the  conservation  pool  from 
42,100  acre-feet  to  44,800  acre-feet.  Use  of  the  re- 
gression-based record  in  conjunction  with  the 
increased  evaporation  record  yielded  a  required 
conservation  pool  of  27,600  acre-feet.  The  difference 
in  expected  net  benefits  of  the  two  second-case 
designs  was  $1,900,000.  Thus,  the  more  critical 


PROCEEDIN(;S  OF  THE  SYMPOSIUM  ON  STATISTICAL  HYDROLOGY 


19 


_  10 

CO 
UJ 
X 

^  8 


1  r 


1  1  r 


1  r 


  TEXAS  DATA 

  WEST  VIRGINIA  DATA 


0  N 


M     A     M  J 
MONTH 


S  0 


Figure  L  — Monthly  potential  evaporation. 


conditions  of  high  evaporation  influenced  the  worth 
of  the  streamflow  data  in  a  positive  manner. 

Although  the  evaporation  data  seem  unimportant 
in  the  design  of  the  reservoir  and  have  minor  effects 
on  the  worth  of  streamflow  data,  they  do  have  in- 
trinsic value  of  their  own.  Water  lost  to  other  pos- 
sible downstream  users  by  the  process  of  evapora- 
tion should  be  considered  in  the  actual  cost  of  the 
project,  even  though  it  often  is  not.  If  this  aspect 
had  been  considered  in  the  above  example,  a  sig- 
nificant increase  in  the  cost  of  more  than  S2  million 
would  be  included  in  the  justification  of  the  sccoiul 
case.  This  cost  is  derived  from  the  apparent  addi- 
tional loss  of  evaporation  of  4,500  acre-feet  annually. 


This  water  was  assumed  to  have  a  unit  value  of 
half  of  that  of  the  project  because  it  would  manifest 
itself  as  increased  spill  from  the  reservoir  and  would 
require  additional  regulatory  facilities  to  bring  its 
value  up  to  that  of  the  controlled  water  delivered 
from  the  reservoir.  Inclusion  of  the  Si!  million  cost 
would  have  had  no  effect  on  the  worth-of-data  values 
because  this  cost  would  be  applicable  to  both  the 
optimal  design  and  to  the  "current-knowledge" 
design. 

Seasonality  of  Doniand 

As  well  as  imparting  realism,  seasonal  variation 
of  demand  in  the  model  requires  consideration  of 
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two  further  factors;  both  the  range  of  the  variation 
and  its  phase  relation  to  the  seasonal  distribution 
of  streamflow  can  be  in  error.  In  this  study  only 
the  latter  possibility  is  investigated. 

Figure  2  represents  the  relation  of  seasonal  de- 
mand to  seasonal  streamflow  in  the  optimal  case. 
A  phase  angle,  (f),  of —  4.2  months  is  found  between 
the  first  harmonic  of  demand  and  the  first  harmonic 
of  the  mean  logarithm  of  monthly  streamflow.  The 
negative  value  of  4>  indicates  that  the  peak  demand 
is  chronologically  after  the  peak  mean  monthly 
streamflow.  This  phase  angle  can  be  expected  to 
have  a  major  effect  on  the  amount  of  within-year 
storage  that  is  required  to  retain  the  peak  flows  for 
use  during  the  period  of  high  demand.  By  shifting 
the  phase  angle  of  demand  while  holding  the  other 


factors  constant,  sensitivity  analysis  can  illustrate 
the  effect  of  (/>  on  the  required  conservation  storage. 
In  figure  3,  it  is  seen  that  the  sensitivity  of  total 
conservation  storage  to  the  phase  angle  is  very 
high,  which  indicates  that  for  development  of  the 
Tygart  Valley  River  within-year  storage  may  be 
equally  important  as  over-year  storage. 

Figure  4  illustrates  the  effect  of  </>  on  the  maximum 
worth  of  a  streamflow  record  at  the  Tygart-Valley- 
River-near-Belington  site,  which  was  again  com- 
puted as  the  difference  in  expected  net  benefits 
between  designs  with  the  optimal  record  and  with 
the  regression-based  record.  Computations  of  maxi- 
mum worth  were  made  for  each  phase  angle  rep- 
resented by  a  point  on  the  curve.  The  worth  of  data 
is  associated  with  the  phase  angle  in  a  logical 
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Figure  2.  — Cyclicities  of  streamflow  and  demand. 
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Figure  3.  — Relation  of  conservation  storage  and  demand  phase  angle. 


manner.  The  less  critical  relation  of  demand  and 
hydrology  (in  phase  or  nearly  so)  results  in  less 
value  attached  to  the  streamflow  record;  whereas 
in  more  critical  conditions  (out  of  phase)  the  value 
of  the  records  is  greater.  Both  figures  3  and  4 
illustrate  the  fact  that  not  only  is  the  level  of  demand 
on  a  stream  important,  but  also  that  the  type  of 
demand,  which  controls  its  seasonality,  is  important 
in  the  plans  for  developing  the  stream. 

EflFects  of  Errors 

The  two  previous  sections  have  treated  evapora- 
tion rates  and  demand  cyclicity  as  known  quan- 
tities and  looked  at  their  effects  on  the  reservoir 
design  and  worth  of  data.  In  reality,  these  quantities 
are  not  known,  and  estimates,  which  contain  some 
degree  of  error,  must  be  used  in  the  actual  design 
of  the  reservoir.  Such  errors  result  in  a  reduction 
of  the  net  benefit  of  the  project  because  their  in- 
fluence on  the  reservoir  design  forces  it  away 
from  the  optimum  for  the  true  conditions.  Sensitivity 
analysis  can  be  used  to  define  the  reduction  in 
benefit  if  both  the  true  conditions  and  the  magni- 
tudes of  the  errors  are  known.  Expected  effects 


of  the  errors,  however,  cannot  be  stated  unless 
their  probabiUty  distributions  are  also  specified. 

If  the  data  of  table  1  are  considered  as  the  true 
data  set,  errors  in  evaporation  rates  have  very  little 
effect  on  the  expected  net  benefit.  The  relation  of 
change  in  benefit  to  evaporation-rate  errors  was 
investigated  for  the  error  of  assuming  no  evapora- 
tion and  for  assuming  the  Texas-evaporation  rate 
of  figure  1.  Both  of  these  errors  yielded  decreases 
in  expected  net  benefits  of  less  than  $50,000  which 
were  negligible  percentages  of  the  total  of  $213,- 
200,000  for  the  project. 

The  changes  in  expected  net  benefits  from 
the  project  can.  however,  be  seen  to  vary  sig- 
nificantly with  incorrect  values  of  <t)  in  figure  5. 
Values  of  net  benefit  used  to  define  figure  5  are 
based  on  the  correct  value  of  d)  being  —  4.2.  Several 
interesting  observations  can  be  made  based  on 
this  analysis.  Underdesign  as  illustrated  between 
<f)  =  —  4.2  and  <t>  =  3.4  has  much  more  delete- 
rious effect  than  overdesign  (3.4  <f)  <  6  or 
—  6  <  <f>  <  —  4.2).  The  maxinuim  overdesign, 
</>  =  6,  which  results  from  streamflow  and  estimated 
design  being  18(^  out  of  phase,  yields  a  reduction 
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Figure  4.  — Relation  of  worth  of  streamflow  record  and  phase  angle  of  demand. 


of  only  $500,000  while  maximum  underdesign  — 
streamflow  and  estimated  demand  in  phase  — results 
in  a  $10  milUon  loss  in  net  benefits. 

A  point  evolves  at  0  =  3.4  at  which  errors  in 
some  seasons  compensate  for  those  of  other 
seasons,  and  a  reservoir  size  equal  to  the  optimum 
is  defined.  Because  the  reservoir  size  equals  the 
optimum,  no  reduction  in  net  benefits  can  be 
attributed  to  making  an  error  of  this  size;  however, 
the  likelihood  of  making  this  particular  error  would 
be  very  small. 

Effects  of  Flood  Pool  Operation 

AvailabiHty  of  the  design  flood  pool  for  auxiliary 
conservation  storage  during  nonflood  seasons  is 
usually  fixed  rather  arbitrarily  by  the  designer  of 


the  flood  pool.  A  reahstic  provision  for  flood  pool 
use,  given  in  table  1,  was  specified  for  the  optimum 
design.  If  a  more  conservative  schedule  of  avail- 
ability is  used,  the  size  of  the  optimum  conservation 
pool  will  be  increased,  and  the  resulting  increase 
in  cost  will  cause  a  reduction  in  the  net  benefits 
of  the  project  if  no  further  flood  control  benefits 
are  accrued. 

Sensitivity  of  the  expected  net  benefits  to  the 
constraint  of  flood  pool  availability  was  tested  at 
two  levels:  Case  I  — flood  pool  availability  reduced 
to  half  that  of  the  optimal  case,  and  case  II  — flood 
pool  unavailable  for  intentional  conservation  stor- 
age during  the  entire  year.  The  results  of  this 
analysis  are  given  in  table  2. 

Constraining  the  flood  pool  results  in  overdesign. 
This  type  of  overdesign  yields  larger  decreases  in 
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the  expected  net  benefit  than  previous  examples  of 
overdesign,  because  in  this  situation  there  are  no 
compensating  increases  in  total  benefits  to  offset 
the  increased  project  costs. 

If  the  flood  pool  is  constrained  to  flood-prevention 
operations  entirely,  the  worth  of  streamflow  data 
in  the  design  of  the  conservation  pool  is  slightly 
increased  from  $1,600,000  to  $1,700,000  for  the 
optimum  conditions.  This  increase  is  generated  by 


a  higher  information  requirement  associated  with 
the  more  critical  conditions  of  the  added  constraint. 

EflFect  of  Methodology 

Use  of  the  shortage-index  concept  in  determin- 
ing the  size  of  the  conservation  pool  immediately 
raises  the  question  of  its  comparison  with  the  more 
ingrained  methods  of  design  such  as  the  mass-curve 


2I4|-^ 


202 1 — I  \  \  \  I  I  I  I  I  \  \  \  

±6         -4         -2  0  2  4  ^6 

(/>(IN  MONTHS) 


Figure  5.  — Relation  of  expected  net  benefit  and  phase  angje  of  demand. 
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Table  2.— Effect  of  flood  pool  availability 


Case 

Conservation 

Project  cost 

Decrease  in 

pool 

net  benefit 

Acre-feet  Dollars  Dollars 

Optimal  case   42,100  16,800,000   

Case  1                                                                          51,000  18,000,000  1,200,000 

Case  II                                                                         60,300  19,000,000  2,200,000 


technique.  By  determining  the  minimum  reservoir 
size  that  yields  an  expected  shortage  index  of  zero, 
resuhs  of  the  mass-curve  technique  can  be  approxi- 
mated. This  approximation  is  an  improvement  of 
the  standard  mass-curve  approach  because  it  ac- 
counts for  the  seasonaHty  of  demand  and  evapora- 
tion more  accurately. 

The  minimum  conservation  pool  for  the  Tygart 
Valley  River  that  would  yield  a  shortage  index  of 
zero  with  all  other  conditions  being  at  their  optimum 
was  89,500  acre-feet,  which  results  in  an  expected 
net  benefit  of  $209,000,000.  Costs  of  providing  the 
larger  and  more  conservative  reservoir  therefore 
outweigh  the  added  benefits  of  fewer  and  smaller 
deficiencies  in  supply  by  $4,200,000,  which  is  the 
difference  in  net  benefits  for  the  two  design  pro- 
cedures. It  must  be  stressed  that  this  difference  has 
within  it  the  assumption  that  the  shortage-index 
design  with  7=0.25  provides  the  true  optimum. 

Strict  adherence  to  the  mass-curve  technique  of 
reservoir  design  leaves  no  room  for  economic 
optimization  of  the  design.  By  this  method,  the 
marginal  cost  of  developing  the  supply  is  expended 
no  matter  what  its  magnitude  if  the  project  has  an 
acceptable  benefit-cost  ratio.  This  is  equivalent  to 
an  implicit  assumption  of  the  mass-curve  technique, 
that  the  marginal  worth  of  water,  which  is  the  worth 
of  the  last  unit  of  water  provided  by  the  project,  at 
the  planned  level  of  development  is  extremely  high. 
Because  of  this  fact,  the  methodology  employed 
previously  to  define  the  worth  of  streamflow  data 
yields  a  value  that  approaches  infinity  for  the  worth 
of  streamflow  data  used  in  the  mass-curve  design. 

Because  of  the  huge  disparity  between  the  meas- 
ures of  data  worth  in  the  mass-curve  technique 
(S/=0)  and  the  shortage-index  method  (S/  =  0.25), 
sensitivity  of  data  worth  to  other  values  of  shortage 
index  was  tested.  Results  of  these  tests  and  an 
approximate  trend  line  are  given  in  figure  6.  It  can 


0 


0.1  0.2 
SHORTAGE 


0.3 
NDEX 


TlGURE  6. —  Sensitivity  of  vkforth  of  data  to  optimum  shortage 
index. 

be  seen  that  in  the  vicinity  of  57  =  0.25,  worth  is 
relatively  insensitive  to  choice  of  optimum  shortage 
index.  However,  as  zero  shortage  index  is  ap- 
proached or  as  the  shortage  index  becomes  large, 
sensitivity  is  increased.  The  rapid  rise  in  worth  of 
data  was  explained  previously.  The  less  rapid  de- 
crease in  worth  at  values  of  shortage  index  greater 
than  0.3  can  be  attributed  to  the  less  binding 
constraints  that  result  from  the  less  stringent 
demand  function. 

Conclusions 

The  sensitivity  of  worth  of  streamflow  data  in  a 
reservoir  design  to  each  of  four  variables  was 
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investigated.  The  resulting  hierarchy  from  most 
sensitive  relation  to  least  is  presented  in  table  3. 
The  hierarchy  results  are  for  a  particular  type  of 
hydrology  and  a  particular  level  and  type  of  develop- 
ment. Attempts  to  generalize  worth  of  data  studies 
must  consider  at  least  those  variables  that  ranked 
high  in  the  above  hierarchy.  Other  hydrologic 
regimens  and  levels  of  development  should  be 
investigated  by  the  procedures  of  this  study  to 
verify  or  expand  the  list  of  variables  that  influence 
worth  of  streamflow  data. 

Worth-of-data  values  were  found  to  be  higher  for 
more  critical  conditions  of  a  particular  variable 
than  for  more  lax  conditions.  This  fact  was  borne 
out  for  each  of  the  variables. 

The  effects  of  errors  in  the  phase  angle  of  demand 
on  the  expected  net  benefit  of  the  reservoir  project 
were  found  to  be  significant.  For  this  reason  and 
because  of  the  prime  status  of  this  variable  in  the 
hierarchy  above,  it  would  seem  that  a  foUowup  study 
of  the  sensitivities  to  the  amplitude  of  the  demand 
cycle  would  be  in  order. 


Table  3.  —  Hierarchy  of  sensitivities 


Literature  Cited 


Rank 
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2 
3 
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Phase  angle  of  demand 

Methodology  

Evaporation  rate  

Flood  pool  availability.. 


Range  of  worth 
of  data 


Millions  of 
dollars 

0-1.8 
1.4-1.9 
1.6-1.9 
1.6-1.7 
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MULTISITE  STREAMFLOW  SIMULATION  OF  TRUCKEE  RIVER, 

NEVADA 

By  V.  L.  Gupta  and  J.  W.  Fordham  ^ 


Abstract 

This  paper  details  the  problems  associated  with 
conducting  sequential  simulation  studies  of  monthly 
streamflow.  The  study  focusses  upon  a  configura- 
tion of  six  gaging  stations  in  Truckee  River  System. 
The  concepts  of  simulation  are  structured  in  such  a 
manner  that  both  the  serial  and  cross  correlations 
of  the  station-to-station  dependence  is  preserved. 
The  results  are  investigated  in  terms  of  comparing 
the  statistical  properties  of  the  simulated  sequences 
and  the  historic  flow  inputs.  Some  of  these  proper- 
ties include  mean,  standard  deviation,  and  skew- 
ness;  correlograms;  and  frequency  distributions. 
Principal  conclusions  are  outlined. 
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Introduction 

Simulation,  as  an  assemblage  of  tools  for  explor- 
ing the  behavior  and  response  of  physical  systems, 
has  a  relatively  long  history.  However,  only  in  recent 
years  has  its  use  become  widespread  as  a  manage- 
ment tool  in  water  resources  planning. 

In  general  terms,  simulation  is  defined  as  a 
process  which  enables  the  duplication  of  the  essence 
of  a  system  or  activity  without  necessarily  attaining 
reality  itself.  A  multitude  of  examples  can  be  cited 
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to  illustrate  the  applicability  of  simulation.  These 
encompass  analog  and  digital  simulations  and 
physical  (or  icononic)  models. 

In  the  context  of  water  resources  management, 
objectivity  in  decisionmaking  is  customarily 
attained  by  full  utilization  of  historically  available 
hydrologic  as  well  as  economic  data.  Quite  often 
systems  are  characterized  by  inadequate  stream- 
flow  records.  If  managerial  decisions  are  based  on 
such  inadequate  traces  of  information,  the  response 
of  the  regional  system  cannot  be  postulated  under 
a  variety  of  possible  alternatives  and  "equally 
likely"  streamflow  regimen  within  the  system.  For 
instance,  the  available  hydrologic  record  may 
register  lack  of  critical  sequence  of  years  of  low 
or  high  runoff,  and  the  most  severe  drought  or  flood 
in  such  short-term  records  may  not  reflect  the 
statistical  features  of  the  streamflow  patterns. 

In  the  realm  of  river  basin  management,  water 
resources  planners  can  effectively  and  efficiently 
evaluate  managerial  policies  if  provided  with  in- 
formation related  to  consequences  of  possible 
alternative  sequences  of  streamflow,  in  addition  to 
historic  data.  Furthermore,  system  performance  can 
be  examined  within  a  "confidence  region"  to  reflect 
the  associated  "variability"  of  behavior. 

Consequently,  the  development  of  synthetic 
streamflows  from  historic  data  is  of  fundamental 
significance  within  the  water  resources  planning 
horizon.  Such  synthetic  streamflows  should  reflect 
the  statistical  characteristics  of  the  corresponding 
historic  flows.  Several  terms  for  identifying  such 
tasks  appear  in  the  literature.  Some  of  these  are 
"stochastic  simulation,"  "stochastic  hydrology," 
and  "operational  hydrology." 

The  scope  of  this  paper  is  limited  to  reporting 
(1)  a  case  study  which  involves  monthly  streamflow 
simulation  on  a  six-satellite  station  configuration 
and  (2)  scrutiny  of  results. 
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Simulation  Techniques 

Early  attempts  for  sequential  generation  of 
streamflows  at  a  single  site  have  involved  use  of  an 
array  of  cards  bearing  runoff  volumes,  one  value  per 
card,  shuffling  of  cards,  and  drawing  off  one  card  at 
a  time  on  a  repetitive  basis.  Values  obtained  thereof 
were  to  be  reckoned  as  the  simulated  sequence. 
Obviously,  this  approach  did  not  sustain  a  long-term 
acceptability.  Grounds  for  an  overall  rejection  of  this 
procedure  rest  on  the  observation  that  future  events 
will  not  possess  identically  the  same  magnitudes 
as  those  of  historic  events. 

A  basic  improvement  over  the  aforementioned 
method  of  using  cards  consisted  of  making  use  of 
a  table  of  random  numbers  to  synthesize  a  relatively 
long  sequence  of  streamflows  having  the  same  mean 
and  standard  deviation  as  that  of  historic  record 
and  assuming  a  normal  distribution.  Such  an  ap- 
proach was  indeed  a  significant  improvement  over 
shuffling  cards.  However,  two  features  of  this  pro- 
cedure were  questionable.  First,  streamflows  are 
known  to  demonstrate  distributional  patterns  other 
than  normal.  Second,  use  of  random  numbers  dis- 
regards any  serial  correlation  that  may  exist  be- 
tween streamflows. 

A  significant  development  in  the  rationale  for 
stochastic  simulation  consists  of  the  recognition  of 
the  need  for  structuring  mathematical  models. 
Components  of  such  a  model  should  reflect  statis- 
tical characteristics,  often  demonstrated  by  stream- 
flow  data  arrays.  Examination  of  streamflow  se- 
quences would  reveal  the  influence  of  either  a 
persistence  or  a  deterministic  component  and  a 
random  or  a  stochastic  component.  The  term 
"persistence"  is  referred  to  manifestation  of 
"eff"ects"'  even  after  "causes"  are  removed.  In  the 
context  of  streamflow  events,  persistence  is  com- 
monly reflected  by  the  likelihood  of  high  runoff" 
events  to  follow  high  runoff^  events.  Likewise,  low 
runoff  events  to  foUow  the  lows.  Additional  vari- 
ability in  streamflow  from  one  month  to  the  next, 
unexplainable  by  persistence  alone,  is  customarily 
lumped  into  the  random  component.  Conceptually, 
the  aforementioned  characteristics  govern  the 
structure  of  currently  known  mathematical  models. 
One  of  the  first  formulations  was  advocated  by 
Thomas  and  Fiering  {10).  In  principle,  the  model 
relates  the  flow  in  any  period  as  a  linear  function 
ol  the  flow  in  the  preceding  period  with  the  appro- 


priate superimposition  of  random  component.  Rec- 
ognition of  "eff'ects"  due  to  antecedent  causes 
thereby  incorporating  the  serial  correlation,  makes 
their  model  applic;able  to  flow  regimen  of  any 
duration. 

A  multitude  of  models  providing  the  methodology 
for  stochastic  streamflow  simulation  are  available. 
Notable  examples  of  some  of  these  models  are 
postulated  by  Thomas-Fiering  (3,  4,  10);  Beard  (1 ); 
Harms  and  Campbell  (6);  Young  and  Pisano  (11): 
Mandelbrot  and  Wallis  (7);  and  Payne,  Newman, 
and  Kerri  (9).  A  comprehensive  summary  of  the 
state  of  the  art  is  presented  by  Butcher  and 
others  (2). 

Study  Area 

Upper  reaches  of  Truckee  River,  Nevada- 
California,  forms  the  regional  basis  for  the  simula- 
tion studies  reported  in  this  paper.  In  the  realm  of 
this  study,  the  hydrologic  system  is  confined  to 
monthly  streamflow  regimen  described  by  six- 
station  configuration.  Hydrologic  history  of  Truckee 
River,  like  several  other  river  basins  of  Western 
United  States,  is  generously  characterized  by  stor- 
age, diversions,  and  irrigation  demands  with  regard 
to  surface  water  resources.  Selection  of  stations  for 
simulation  was  principally  governed  by  (1)  avail- 
ability of  reasonably  long  and  consistent  information 
on  streamflow;  (2)  stations  in  the  vicinity  of  use 
points;  and  (3)  reaches  where  critical  decisions  are 
likely  to  be  made  for  maximizing  the  water  resource 
benefits.  Generation  of  monthly  streamflow  se- 
quences for  each  of  the  six  satellite  stations  is  the 
central  issue  of  this  i)aper.  Regional  location  of  the 
study  area  is  illustrated  in  figure  1. 

Data  Preparation 

An  inventory  of  available  streamflow  data 
uncovered  several  gaps.  Small  dams  were  com- 
missioned on  Donner  Creek,  Prosser  Creek,  and 
Little  Truckee  River.  In  conjunction  with  the  Lake 
Tahoe  diversion  dam,  these  hydrauHc  structures 
regulate  on  approximately  70  percent  of  the  flow  in 
Truckee  River.  A  fundamental  task  in  data  prepara- 
tion has  involved  transformation  of  regulated  \\o\\ 
data  to  corresponding  natural  flow  data,  namelv  the 
flows  which  would  otherwise  occur  if  it  were  not  for 
regulation,  but  with  the  prevailing  historic  land  use 
patterns.  Using  known  monthly  How  release  prac- 
tices,    diversions     wherever     applicable,  stage- 
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discharge  rating  curve  together  with  information 
on  exports  from  Lake  Tahoe  Basin,  back-routing 
was  performed  to  transform  regulated  streamflow 
traces  to  natural  flow  regimen  for  the  study  area. 


The  resulting  unregulated  monthly  flow  volumes, 
with  regard  to  data  sizes,  can  be  identified  from 
table  1.  Figure  2  illustrates  the  data  assemblage 
for  the  six  stations  indicating  the  gaps  in  the  record. 


Figure  I.  — Regional  locatioaof  satellite  stations. 
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Figure  2. -Record  lengths  of  monthly  streamflow. 

Methodology 

After  a  relatively  thorough  search  for  methodol- 
ogy applicable  for  monthly  streamflow  simulation 
in  a  multisite  situation.  Beard's  (7)  procedures 
were  adopted  in  this  study.  This  decision  was 
reached  for  the  following  reasons: 

1.  Convincing  evidence  has  prevailed  that  the 
method  will  receive  widest  possible  application 
among  a  variety  of  situations.  Consequently,  prag- 
matic familiarity  and  experimentation  with  the 
method  were  considered  highly  desirable. 

2.  Data  inventory,  as  illustrated  in  figure  2,  has 
reflected  gaps,  and  the  method  provides  for  filling 
in  missing  flows  prior  to  sequential  generation. 

3.  Computer  program  packages  are  readily  avail- 
able for  administering  the  procedure. 

The  aforementioned  program  package  has  the 
following  optional  routines: 
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1.  Analysis  (computation  of  statistics  of  inter- 
related stations); 

2.  Filling  in  missing  flows  and  testing  for  con- 
sistency of  matrices  of  correlation  co- 
efficients; 

3.  Generating  streamflow  sequences  of  any 
desired  length  using  statistics  obtained  in 
step  1; 

4.  Use  of  generalized  (or  regional)  statistics, 
to  be  furnished  by  users,  for  analysis  and 
generation;  and 

5.  Maximum,  minimum  and  averages  for  re- 
corded, reconstituted,  and  generated  flows 
for  1-,  6-,  and  54-month  periods. 

The  above  options,  except  (4),  were  used  in  this 
study.  The  procedure  has  the  following  character- 
istic features: 

1.  Incrementing  the  monthly  flows  at  each  satel- 
lite station  by  1.2  percent  of  average  flow  for  that 
station  and  month.  The  increment,  ^j.  chosen 
arbitrarily,  is  subtracted  from  generated  sequences 
in  the  final  stage  of  computations.  Incrementing 
historic  flow  data  is  to  avoid  possibility  of  negative 
logarithms.  Logarithmic  transformation  of  incre- 
mented flow  data  is  the  initial  stage  of  computations. 

2.  Computation  of  mean,  standard  deviation, 
and  coefficient  of  skew  for  each  station,  and  for 
each  calendar  month. 

Calculated  skew  coefficients  are  smoothed  and 
truncated  to  lie  within  the  range  of  —  1  to  -f  1,  and 
all  smoothed  skew  coefficients  lower  than  —  1  are 
made  equal  to  —  1.  Those  that  are  greater  than  +  1 
are  set  at  +  1. 

3.  Transformation  of  historic  flow  data  to  Pearson 
Type  III  standard  deviates: 


Table  \.  — Monthly  streamflow  {unregulated)  data  assemblage 


Station  No. 

Station  name 

Drainage  area 

Length 

of 
record 

101  

Inflow  to  Tahoe  Basin  

Square  miles 
505 
47 
14.6 
53.6 
172 
932 

Years 

1925-67 

im-61 

l^V-67 

1943-67 

1939-67 

1925-67 

102  

103  

104  

106  
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/here 


and 


Xi,  m=  log  (Qi,  m  +  qi) 
Xi=f^  Xi,,nlN 


(1) 

(2) 
(3) 


where, 

i=  month  number  (1,  2,  3  .  .  .  12) 
/n=  year  number  (1,  2,  3  .  .  .  A^) 
S  =  standard  deviation 

t=  Pearson  Type  III  standard  deviate 
Q=  monthly  streamflow 
q  =  flow  increment 

JS;'=  logarithm  of  incremented  monthly  flow 
A^=  number  of  years  of  record. 

4.  Transformation  of  Pearson  Type  III  deviates, 
t  values,  to  normal  standard  deviates,  preserving 
the  smoothed  and  truncated  skew  coefficients  of 
flow  data. 


6 


giti,  . 


+  1 


1/3 


1 


(4) 


where, 

normal  standard  deviate 
t=  Pearson  Type  III  standard  deviate 
g^=skew  coefficient. 

Equation  4  is  referred  to  as  Sammons'  transforma- 
tion. 

5.  Computations  of  serial  correlations,  cross  cor- 
relations of  lags  zero  and  —  1. 

6.  When  gaps  in  record  are  encountered,  correla- 
tion matrices  of  step  5  are  rendered  incomplete. 
Missing  correlation  values  are  estimated  by  the 
following: 


R,,  =  R,iR,j  ±  Vil-Rl^){l-Rl.). 


(5) 


Equation  5  introduces  considerable  arbitrariness 
in  the  procedure.  Missing  correlation  coefficients 
are  assumed  to  lie  within  the  range  advocated  by 
equation  5.  Highest  upper  limit  and  lowest  lower 
limit  are  averaged  to  furnish  the  missing  correlation 
coefficient. 

7.  At  the  end  of  step  6,  correlation  matrices  are 
rendered  complete.  Consistency  of  each  of  the 
matrices  is  tested  to  see  whether  the  principal 


diagonal  is  1,  and  that  multiple  correlation  co- 
efficient is  1.  If  this  condition  is  not  realized, 
arbitrary  adjustments  on  the  radical  in  equation  5 
are  made.  Consequently,  the  test  procedure  is 
iterative,  rationale  is  subjective  as  well  as  arbitrary. 

8.  Reconstitution  of  missing  flows  with  the  ciid 
of  recursion  equation  between  normal  standard 
deviates,  that  is,  Ki,  m  values  of  step  4,  as  follows: 

Kij  =  biKi.  I  +  hzKu  2  +  ftsX,-,  3  +  .  .  .  +  bj-.,Kij^i 

+  bjKi-ij-\-  bj+iKi-i,j+i  +  .  .  .  +  bnKi-i,n.  (6) 

where, 

K  =  normal  standard  deviate 
b  =  regression  coefficient 
I  =  month  number 
j=  station  number 
n  =  number  of  stations 

Regression  coefficients  are  converted  to  beta 
coefficients  by  means  of 

P  =  b^-^  (7) 
Si 

where, 

P  =  beta  coefficient 
6=  regression  coefficient 
S  =  standard  deviation 
i  =  month  number 

At  this  stage  a  random  component  is  introduced 
as  follows: 

Random  component  =  (  VI  -  R^)  (random  number). 

Conventional  procedures  for  generation  of  random 
numbers  imply  rectangular  distribution  with  mean 
=  0,  standard  deviation  =  0.288,  and  coefficient  of 
skew=0.  From  each  pair  of  successive  random 
numbers  generated  in  accordance  with  rectangular 
distribution,  normally  distributed  random  numbers 
or  deviates  are  computed  as: 


?j  +  ,=  (-2  logeO)l/2  sin  (2770  +  ,) 


(8) 


where, 

^  =  random  number  from  standard  population 

j—  counter 

r=  rectangularly  distributed  random  number 


PROCEEDINGS  OF  THE  SYMPOSIUM  ON  STATISTICAL  HYDROI.O(;Y 


31 


Equation  6  is  transformed  to: 

+  PjKi-uj+Pi^xKi-^.j^^  +  ft„Ki-,,n  (9) 

+  (l-/?f,.)>/2Z,-.,-. 

9.  Normal  standard  deviates  obtained  in  step  8 
are  converted  to  Pearson  Type  III  variates  using 
Sammons'  inverse  transformation.  Conceptually, 
this  segment  of  analysis  is  a  reversal  of  step  4.  The 
transformation  is  given  by 


ti. 


|(^:,.,„-|)  +  i}  -1 


and  the  flow  logarithms 

^i,  m  ti,  inSi 

and  the  flows 

Qi_  ,„  =  [  Antilog  Xi,  ,„]  —  Qi. 


(10) 


(11) 


(12) 


10.  Generation  of  synthetic  sequences  of  flow  for 
any  desired  length  of  record  is  facilitated  by  setting 


initial  flows  at  satellite  stations  equal  to  their  respec- 
tive average  flow  values.  Streamflows  for  each  sta- 
tion and  month  are  computed  by  determining  normal 
standard  deviates  with  the  aid  of  equation  9  and 
transforming  the  deviates  to  flow  values,  utilizing 
equations  10  through  12.  As  an  arbitrary  measure, 
the  first  2  years  of  generated  flows  are  discarded. 

11.  Usual  search  techniques  are  employed  to 
evaluate  maximum  and  minimum  flow  volumes  for 
periods  of  1,  6,  and  54  months. 

Results 

In  an  initial  simulation  trial,  six  sets  of  generated 
sequences  of  50  years  each  were  obtained  for  each 
of  six  stations.  PreHminary  scrutiny  of  simulated 
sequences  was  done  by  drawing  double-mass  curves 
for  each  month  for  each  of  six  satellite  stations. 
Visual  examination  of  profiles  of  double-mass  curves 
led  to  qualitative  conclusions,  at  the  very  best. 
Table  2  illustrates  relative  comparison  of  mean 
monthly  flows  for  the  six  satelhte  stations  of  his- 
toric, historic  and  reconstituted,  and  the  six  sets 
of  generated  flows.  The  indicated  flows  are  repre- 
sentative of  gross  means  averaged  over  the  entire 
period.  Percent  deviations  of  synthesized  flow  se- 
quences from  historic  flows  are  presented  in  table  3 
and  illustrated  in  figure  4. 


Table  2.  — Comparison  of  mean  monthly  flows  (historic  and  reconstituted  vs.  generated) 


Station  and  station  number 


Types  of  flow 

Tahoe  inflow 

Tahoe-Truckee 

Donner 

Prosser 

Boca 

Farad 

gain 

101 

102 

103 

104 

105 

106 

Historic  flows  

'  42.82 

5.77 

2.25 

5.09 

11.31 

73.99 

Historic  and  reconstituted  flows  

42.82 

5.50 

2.26 

4.99 

11.51 

73.89 

Generated  sets:  ■ 

1  

45.90 

5.59 

2.30 

4.98 

11.88 

74.46 

2  

47.32 

5.51 

2.24 

5.(H 

11.72 

74.52 

3  

46.26 

5.57 

2.30 

4.98 

11,86 

75.09 

4  

46.14 

5.55 

2.30 

4.99 

ILK) 

73.43 

5  

45.43 

5.52 

2.30 

4.99 

11.60 

73.43 

6  

45.71 

5.48 

2.32 

5.(W 

11.81 

74.27 

'  Average  monthly  flow  volumes  in  1,0(K)  acre-feet. 
■  Each  generated  set  is  50  years  long. 
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Figure  3  illustrates  the  comparison  of  historic 
and  reconstituted  flows  on  an  overall  mean  monthly 
volume  basis.  Visual  examination  of  figure  3  cannot 
fuUy  assess  the  reliability  of  the  "filling  in"  process. 
To  administer  a  closer  scrutiny  of  results,  recourse 
was  made  to  apply  the  routing  of  water  downstream 
in  the  study  area.  The  hydrologjc  water  budget  is 
such  that: 

Farad  flow=Tahoe  inflow 
+  Tahoe-Truckee  gain+Donner  flow 
+  Prosser  flow  +  Truckee :  Farad  gain.  (13) 


Rationale  of  equation  13  readily  follows  from  water 
balance  from  Tahoe  to  Farad.  Upon  rearrangement: 

Truckee :  Farad  gain  =  Farad  flow 
—  Tahoe  inflow— Tahoe : Truckee  gain 
—  Donner  flow  — Prosser  flow  — Boca  flow. 

(14) 

Equation  14  was  utilized  as  a  preliminary  basis 
to  scrutinize  the  generated  flow  sequences.  For 
reconstituted  as  well  as  generated  flow  volumes, 
month-by-month,  the  Truckee-Farad   gains  were 


80 


TAHOE         TAHOE        DONNER      PROSSER        BOCA  FARAD 
INFLOW  TRUCKEE 
GAIN 


Figure  3. —  Historic  flows  versus  reconstituted  flows. 
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Table  3.  —  Percent  deviations 


Station 

Recorded  and 
reconstituted 
flows 

Generate 

d  flows ' 

Lowest 
deviations 

Highest 
deviations 

0 

6.10 

10.50 

Tahoe-Truckee  gain  

4.70 

-.37 

1.64 

.45 

-.89 

2.67 

Prosser  

-L96 

-.20 

1.00 

L77 

.78 

3.21 

Farad  

-.14 

-.63 

1.62 
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'  Generated  flow  averages  are  compared  against  those  of  recorded  flow  averages. 


computed.  It  was  observed  that  50  to  70  percent  of 
reconstituted  and  generated  values  were  negative, 
that  is,  less  than  zero.  This  was  found  to  be  a  major 
discrepancy  because  the  rationale  of  equation  14 
assures  positive  Truckee :  Farad  gajns. 

Reconstituted  flows  were  closely  scrutinized  and, 
in  a  meaningful  effort  to  eliminate  the  anomaly  of 
negative  Truckee:  Farad  gains,  the  "filled  in" 
values  were  adjusted  using  engineering  judgement. 
A  second  sequence  of  data  processing  was  launched 
and.  in  this  undertaking,  the  input  data  consisted 
of  43  years  of  monthly  flow  for  each  of  the  six  satel- 
lite stations.  In  other  words,  there  was  no  program- 
reconstitution  of  data  involved  in  this  second  phase. 


Ten  sets  of  flow  volumes  were  generated  only 
to  discover  that  the  Truckee:  Farad  gains  were  still 
negative.  Average  monthly  volumes  for  reconsti- 
tuted flows,  as  well  as  for  the  10  generated  sets, 
were  presented  in  table  4  and  illustrated  in  figure  5. 

Discussion  of  Results 

Scrutiny  of  Tahoe  inflow  averages  in  tables  2 
and  4  has  revealed  that  generated  flow  averages 
were  consistently  higher  than  those  of  historic  or 
reconstituted  values.  In  any  statistical  experiment 
involving  sequential  generation  of  hydrologic  data 
it    is   customary   to  encounter  situations  where 


Table  ^.  —  Comparison  of  mean  monthly  flow  {historic  and  adjusted  vs.  generated) 


Station  and  station  number 


Types  of  flow 

Tahoe  inflow 

Tahoe-Truckee 

Donner 

Prosser 

Boca 

Farad 

gain 

101 

102 

103 

104 

105 

106 

Historic  and  adjusted  flows  

'  42.82 

5.07 

2.04 

4.87 

10.83 

73.99 

Generated  sets:  * 

1  

46.21 

5.07 

2.08 

4.82 

11.11 

74.22 

2  

44.94 

5.12 

2.09 

4.83 

11.29 

74.53 

3  

45.44 

5.04 

2.13 

4.88 

11.12 

74.13 

4  

45.30 

5.15 

2.10 

4.80 

11.22 

74.02 

5  

45.59 

5.14 

2.08 

4.81 

11.17 

74.21 

6  

48.21 

5.14 

2.10 

4.85 

11.20 

75.05 

7..  

46.16 

5.11 

2.10 

4.82 

11.11 

74.31 

8  

46.54 

5.16 

2.11 

5.01 

11.17 

74.36 

9  

46.18 

5.08 

2.08 

4.81 

11.15 

74.38 

10  

46.70 

5.07 

2.06 

4.84 

11.17 

74.23 

'  Average  monthly  flow  volumes  in  1 .000  acre-feet. 
'  Each  generated  set  is  .50  years  long. 
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Figure  4. —  Percent  deviation  of  generated  flow  volumes  (mean  monthly). 


synthesized  values  fluctuate  in  a  random  manner 
about  the  mean  value  of  the  corresponding  historic 
sequence.  This  divergence  prevails  in  other  satel- 
lite stations  as  well,  although  to  a  less  marked 
degree.  Studies  were  undertaken  to  delineate  ex- 
plicit sources  for  observed  discrepancies.  The 
following  description  is  an  attempt  to  narrate  the 
scrutiny  of  results. 
Random  Number  Generator 

One  of  the  first  aspects  examined  was  the  accu- 
racy rendered  by  the  random  number  generator. 
Roefs  "  and  the  authors  have  examined  the  first  three 
moments  as  well  as  serial  correlation  coefficient  of 
lag  1  for  an  array  of  100,000  numbers  generated  by 
the  random  number  generator.  The  results  were 
found  to  conform  with  the  theoretical  values  of  rec- 
tangular distribution,  that  is,  mean  =  0.5,  standard 
deviation  =  0.288,  coefficient  of  skew  =  0.0,  and  se- 
rial correlation  coefficient  of  lag  1  =  0.0. 


-  Roefs,  T.  G.  Personal  communication.  1969. 


Mean,  Standard  Deviation,  and  Skewness 

A  consolidated  summary  of  the  monthly  values 
for  the  first  three  moments  for  all  the  satellite  sta- 
tions is  presented  in  the  Appendix  to  facifitate  com- 
parison of  statistics  of  the  generated  data  with  the 
corresponding  statistics  of  each  of  the  six  generated 
sets.  Figures  6  and  7  illustrate  comparison  of  month- 
ly mean  flows  of  selected  generated  data  sequences 
with  the  corresponding  values  of  historic  data. 

Figure  6  reveals  that  generated  flows  at  Tahoe 
inflow  are  consistently  higher  than  the  historic  flow 
values  during  the  low  flow  regimen  of  the  river,  that 
is,  approximately  October  through  March.  Only  triv- 
ial discrepancies  have  been  observed  at  the  other 
satellite  stations. 

Mean  monthly  standard  deviations  for  selected 
stations  are  illustrated  in  figures  8  and  9.  Erratic 
variations  of  generated  data  from  those  of  historic 
data  are  evidenced  for  several  situations.  Discrep- 
ancies were  found  to  be  rather  pronounced  in  the 
months  characterized  by  low  flow  regimen. 


pro(:eedin(;s  of  the  sympos 

Mean  monthly  skew  coefficients  are  illustrated 
in  figures  10  and  11.  Enhanced  severity  of  deviations 
can  be  observed  during  low  flow  regimen  in  a  con- 
sistent manner.  Rationale  for  simulation  includes 
smoothing  and  truncating  the  coefficients  of  skew 
of  logarithms  of  historic  data.  Observed  discrep- 
ancies can  be  partially  explained  by  noting  that 
[  smoothing  skew  coefficients  of  logarithms  of  values 
is  not  equivalent  to  smoothing  skew  coefficients  of 
the  corresponding  absolute  values. 

Correlograms 

Figures  12  and  13  illustrate  the  correlograms  for 
Tahoe  inflow  and  Prosser,  respectively.  The  cyclical 
patterns  of  historic  and  generated  sets  of  Tahoe  in- 
flow reflect  the  same  periodicity,  however,  with  dif- 
ferent attentuations.  Generated  data  appear  to  be 
less  correlated  than  that  of  historic  data  for  Tahoe 
inflow.  The  correlogram  for  Prosser,  as  illustrated 
*(  in  figure  13,  reveals  close  agreement  of  patterns  with 
[  statistically  insignificant  discrepancies. 
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Frequency  Distributions 

Flow  patterns  for  Tahoe  inflow  were  examined  in 
terms  of  frequency  distributions.  Flows  in  May  and 
August  were  reckoned  to  be  representative  of  "high" 
and  "low"  flow  regimen,  respectively.  Figures  14 
and  15  illustrate  the  frequency  distributions. 
With  regard  to  high  flows,  the  generated  sets  are 
underestimating  the  probable  flow  occurrences.  On 
the  contrary,  the  generated  sets  are  grossly  over- 
estimating the  low  flow  regimen  in  comparison  with 
the  observed  low  flow  regimen. 

General  Comments 

Four  significant  characteristic  features  associated 
with  the  stochastic  simulation  of  streamflow  in  a 
multisite  situation  appear  to  need  special  considera- 
tion. These  are  related  to  frequency  distributions, 
smoothing  of  skew,  hydrologic  overlap,  and  sample 
length  versus  simulated  length. 
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Figure  6.  — Mean  monthly  flow. 


Figure  7.  —  Mean  monthly  flow. 
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Figure  9.  -  Mean  monthly  standard  deviations. 
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Figure  lO.-Skewness. 
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Figure  IL  — Skewness. 


Figure  12.  — Correlogram. 
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Figure  13.  —  Correlogram. 
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Figure  15.  — Tahoe  inflow-distribution  of  August  flows. 


Frequency  Distributions.  — Methodology  adopted 
in  this  study  advocates  unlimited  use  of  Log- 
Pearson  Type  III  distribution.  The  authors  believe 
that  such  a  practice  is  in  conformity  with  the 
uniform  techniques  recommended  by  the  Water 
Resources  Council  Considerable  evidence  exists 
in  literature  (5)  that  it  is  dangerous  to  generalize 
that  a  single  distributional  pattern  is  applicable  to 
all  situations  and  for  all  time  bases.  Preferably, 
historic  data  should  be  examined  for  their  distribu- 
tions station-by-station,  season-by-season,  and 
month-by-month.  It  should  be  recognized,  however, 
that  by  incorporating  different  distributions  in  the 
same  study,  the  degree  of  complexity  of  the  genera- 
tion rationale  will  be  considerably  enhanced. 

In  the  realm  of  the  present  study,  frequency 
distributional  patterns  of  simulated  sequences  of 
Tahoe  inflow  are  significantly  different  than  those 
of  historic  data.  In  addition,  Tahoe  inflows  as 
incorporated  in  this  study  are  basically  conse- 
quences of  precipitation  on  a  large  body  of  water. 
Distribution  of  precipitation  differs  from  that  of 
streamflow  in  any  hydrologic  situation.  Furthermore, 


the  timing  of  precipitation  and  the  consequent 
timing  of  streamflow  are  different.  Application  of  a 
"blanket-distributional  pattern"  to  Tahoe  inflows 
consequently  has  led  to  discrepancies  discussed 
earlier. 

Smoothing  of  Skew.  —  Methodology  adopted  in  this 
study  advocates  smoothing  and  truncating  skew 
coefficients.  It  is  believed  that  such  a  practice  will 
regionalize  the  statistic  as  well  as  correct  for  small 
sample  sizes.  In  terms  of  its  effects  on  the  simu- 
lated sequences,  as  pointed  out  earlier,  discrep- 
ancies were  rather  pronounced  primarily  because 
of  the  fact  that  smoothing  the  skew  coefficients 
of  logarithms  of  values  can  never  be  reckoned  as 
smoothing  the  skew  coefficients  of  the  absolute 
values  of  themselves.  These  features  are  further 
compounded  by  additional  problems  associated 
with  the  very  manner  by  which  skew  is  estimated 
and  its  stability  (or  the  lack  of  it)  in  relation  to 
sample  size. 

Hydrologic  Overlap.  —  In  a  multiple  site  situation, 
such  as  this  study  region,  the  preparation  of  data 
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for  simulation  should  be  such  that  the  satellite 
stations  do  not  overlap  in  terms  of  their  hydrologic 
behavior.  Flow  accretions  rather  than  the  entire 
flow  from  a  downstream  station  should  be  included 
in  data  analyses  for  synthesis. 

Secondly,  no  known  rationale  for  stochastic 
streamflow  simulation  provides  a  mechanism  to 
insure  water  budget  and  its  routing.  In  a  multisite 
I  configuration  where  hydrologic  overlap  is  Hkely  to 
occur,  "water  budget"  becomes  an  important  issue 
and,  if  ignored,  would  undoubtedly  lead  to  erroneous 
discrepancies. 

Sample  Length  Versus  Simulated  Length.  — The 
relatively  profuse  studies  in  the  area  of  stochastic 
simulation  appear  to  lack  one  aspect  in  common. 
!  Sample  size  requirements  of  historic  data  as  re- 
lated to  the  desirable  length  of  generated  sequences 
,  need  special  consideration.  Such  an  interrelation- 
I  ship  would  provide  a  rational  basis  for  assigning 
j  bounds  of  the  simulated  sequences.  These  aspects 
;  are  of  particular  concern  when  simulation  is  under- 
taken to  examine  water  management  pohcies  within 
the  perspective  of  floods,  droughts,  and  "critical 
periods." 

Conclusions 

The  following  are  the  principal  conclusions 
reached  after  an  in-depth  scrutiny  of  simulated 
streamflow  sequences: 

1.  Lack  of  meaningful  agreement  in  the  statistics 
of  historic  and  simulated  sequences  for  Tahoe  in- 
flow, particularly  for  the  low  flow  regimen,  is 
attributable  to  the  use  of  a  blanket  "Log-Pearson 
Type  III  distribution"  for  all  stations,  for  all  seasons, 
and  for  each  month. 

2.  Tahoe  inflow  is  predominantly  a  conse- 
(luence  of  the  precipitation  on  the  large  body  of 
water.  Lag  time  and  the  frequency  distribution  of 
precipitation  and  streamflow  are  distinctly  different. 
Consequently,  postulating  the  same  parent  distri- 
bution has  led  to  discrepancies. 

3.  Smoothing  and  truncating  the  skew  coefficients 
as  advocated  by  the  method  used  in  this  study  is  par- 
tiaUy  responsible  for  the  lack  of  agreement  between 
historic  and  simulated  sequences.  Reasons  for  this 
will  readily  follow  from  noting  that  (1)  even  a  minor 
adjustment  of  the  skew^  coefficient  of  logarithms  of 
flows  will  result  in  a  major  discrepancy  in  the  skew 
coefficient  of  the  absolute  value,  and  (2)  snu)othing 


logarithms  of  values  is  not  e(}uivalent  to  smoothing 
the  corresponding  absolute  values. 

4.  Any  rationale  for  multisite  streamflow  simula- 
tion should  provide  appro|)riate  linkages  in  the  rou- 
tines to  facilitate  a  reasonable  assurance  of  water 
budget. 

5.  Hydrologic  overlap  in  the  sites  selected  for  se- 
quential simulation  should  be  minimized,  if  not 
eliminated. 

6.  Improved  techniques  should  preferably  be  de- 
veloped for  examining  the  reliability  of  simulated 
sequences.  Present  practice  of  comparing  the  first 
three  moments  of  historic  data  with  those  of  simu- 
lated data  is  grossly  inadequate. 

7.  Effects  of  historic  data  lengths  on  the  sequen- 
tial generation  of  data  sequenced  need  to  be  exam- 
ined. The  significance  of  this  interaction  becomes 
pronounced  when  generated  sequences  are  utilized 
in  the  sensitivity  analyses  of  water  management  pol- 
icies with  reference  to  low  fiows,  floods,  water  quality 
control  and  stipulation  of  standards,  and  "critical 
periods." 
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GENERAL  ASPECTS  OF  MULTIVARIATE  ANALYSIS  WITH  APPLICATIONS 
TO  SOME  PROBLEMS  IN  HYDROLOGY  ' 

By  D.  G.  DeCoursey  and  R.  B.  Deaf^ 


Abstract 

Hydrologjc  data  are  collected  at  many  sites 
throughout  the  world  to  provide  estimates  of  rain- 
fall, volumes  of  runoff,  peak  rates  of  flow,  and  other 
events.  These  events  are  caused  by  the  interaction 
of  such  factors  as  the  chmatic  sequences  and  the 
topographic  and  geologic  features  of  the  watersheds. 
The  events  themselves  are  also  very  highly  related, 
that  is,  storm  volumes  and  peak  rates  of  flow.  There- 
fore, the  analysis  of  these  data  becomes  a  classic 
example  of  a  multivariate  problem  in  both  the  input 
and  response  characteristics.  Twelve  independent 
or  causative  variables  and  37  dependent  or  hydro- 
logic  response  characteristics  from  90  runoff  meas- 
uring sites  in  Oklahoma  are  analyzed.  Conventional 
multiple  regression,  stepwise  regression,  and  com- 
ponents regression  are  compared  in  the  develop- 
ment of  prediction  equations.  Several  variations  of 
components  regression,  some  of  which  use  factor 
analysis,  are  presented.  A  variation  of  the  classical 
multiple  discriminant  analysis  is  used  to  find  regions 
of  similar  hydrologic  response.  The  regions  for  some 
hydrologic  variables  are  geographically  contiguous, 
whereas  others  are  not.  Classical  canonical  correla- 
tion is  used  to  find  sets  of  causative  and  response 
variables  that  are  related.  The  concept  was  used 
as  a  base  for  showing  that  it  is  possible  to  develop 
prediction  equations  for  various  hydrologic  char- 
acteristics that  retain  the  observed  intercorrelation 
of  the  hydrologic  characteristics.  The  prediction 
equations  are  all  based  on  the  same  set  of  inde- 
pendent variables. 


'  Contribution  from  Agricultural  Research  Service,  USDA.  in 
cooperation  with  the  Oklahoma  Agricultural  Experiment  Station. 

^  Research  hydraulic  engineer,  USDA,  ('hickasha,  Okla.,  and 
professor  of  biostatistics,  Oklahoma  University  Medical  Center, 
Oklahoma  City,  Okla.,  respectively. 


Introduction 

Most  researchers  engaged  in  water  resources 
work  have  a  variety  of  approaches  and  techniques 
for  describing  the  phenomena  they  are  studying. 
Many  of  the  models  are  used  to  generate  synthetic 
sequences  of  hydrologic  information  not  only  on  the 
watersheds  from  which  they  were  developed,  but 
also  on  other  watersheds.  Thus,  these  models  need 
a  second-order  model  with  which  to  relate  various 
response  variables  or  parameters  to  a  set  of  causa- 
tive or  watershed  and  meteorologic  variables.  In 
general,  the  second-order  models  are  not  deter- 
ministic, but  are  rather  simple,  linear  regression 
models. 

The  various  meteorologic  and  watershed  variables 
are  highly  related  not  only  to  each  other,  but  also  in 
space  and  time.  Therefore,  these  data  should  be  con- 
sidered as  observations  from  a  multivariate  distri- 
bution and,  as  such,  should  be  analyzed  using  multi- 
variate techniques.  In  this  paper,  three  general  topics 
of  a  multivariate  nature  are  described;  other  multi- 
variate tools  are  also  mentioned. 

The  first  topic  described  is  regression  analysis. 
The  advantage  of  multivariate  analysis  in  this  work 
is  that  it  tends  to  develop  more  rational  models  than 
the  common  regression  approaches.  The  second 
topic  described  is  discriminant  analysis.  In  this 
paper,  it  is  shown  how  this  multivariate  tool  may  be 
used  to  regionalize  hydrologic  data.  The  third  topic 
described  is  canonical  correlation.  In  this  presenta- 
tion, it  is  used  to  maintain  intercorrelation  between 
sets  of  dependent  or  hydrologic  variables. 

These  topics  are  illustrated  using  data  fur- 
nished by  the  U.S.  Geological  Survey.  The  data 
include  37  hydrologic  and  12  watershed  ciiaractcr- 
istics  of  90  runoff  measuring  stations  in  Oklahoma 
{table  1).  The  results  of  these  analyses  are  not  neces- 
sarily optimum  solutions  and  are  presented  only  to 
illustrate  a  point.  The  reader  is  assumed  to  have  a 
basic  knowledge  of  multivariate  analysis,  as  a  coni- 
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Table  \.  — Hydrologic  and  watershed  variables  and  their  notation 


Hydrologic  (response)  variables 

Watershed  (causative)  variables 

1. 

Qa 

Mean  annual  now  (c.i.s.) 

1. 

A 

Drainage  area  (sq.  mi.) 

2. 

Ql 

Mean  Jan.  now  (c.i.s.) 

2. 

S 

Main  channel  slope  (it. /mi.) 

3. 

Q2 

Mean  reb.  now  (c.f.s.) 

3. 

L 

Main  channel  length  (miles) 

4. 

(?3 

Mean  Mar.  flow  (c.f.s.) 

4. 

A, 

Area  of  lakes  (percent) 

5. 

C?4 

Mean  Apr.  flow  (c.f.s.) 

5. 

E 

Main  channel  elevation  (m.s.l.) 

6. 

<?5 

Mean  May  flow  (c.f.s.) 

6. 

F 

Forest  cover  as  a  fraction 

7. 

<?6 

Mean  June  flow  (c.f.s.) 

7. 

P 

Mean  annual  precipitation  (in.  per  year) 

8. 

<?7 

Mean  July  flow  (c.f.s.) 

8. 

h 

Precipitation  intensity  (2-yr.,  24-hr.) 

9. 

<?8 

Mean  Aug.  flow  (c.f.s.) 

9. 

1 100 

Precipitation  intensity  (100-yr.,  24-hr.) 

10. 

<?9 

Mean  Sept.  flow  (c.f.s.) 

10. 

Average  annual  Class  A  pan  evaporation 

11. 

Mean  Oct.  flow  (c.f.s.) 

11. 

0 

Orientation  of  watershed  in  (radions) 

12. 

Mean  Nov.  flow  (c.f.s.) 

12. 

Si 

Soil  infiltration  index  (inches) 

13. 

Ql2 

Mean  Dec.  flow  (c.f.s.) 

14. 

So 

SDM  '  annual  flow  (c.f.s.) 

15. 

s, 

SDM'  Jan.  flow  (c.f.s.) 

16. 

S2 

SDM  "  Pel?,  flow  (c.f.s.) 

17. 

S3 

SDM  '  Mar.  flow  (c.f.s.) 

18. 

S4 

SDM  '  Apr.  flow  (c.f.s.) 

19. 

S5 

SDM  '  May  flow  (c.f.s.) 

20. 

SDM '  June  flow  (c.f.s.) 

21. 

St 

SDM  '  July  flow  (c.f.s.) 

22. 

Ss 

SDM  '  Aug.  flow  (c.f.s.) 

23. 

Ss 

SDM  •  Sept.  flow  (c.f.s.) 

24. 

SlO 

SDM  ■  Oct.  flow  (c.f.s.) 

25. 

Su 

SDM  '  Nov.  flow  (c.f.s.) 

26. 

SDM  >  Dec.  flow  (c.f.s.) 

27. 

2-yr.  peak  rate  (c.f.s.) 

28. 

95 

5-yr.  peak  rate  (c.f.s.) 

29. 

9io 

10-yr.  peak  rate  (c.f.s.) 

30. 

925 

25-yr.  peak  rate  (c.f.s.) 

31. 

950 

50-yr.  peak  rate  (c.f.s.) 

32. 

V, 

Max.  2-yr.,  7-day  vol.  (s.f.d.) 

33. 

V20 

Max.  20-yr.,  7-day  vol.  (s.f.d.) 

34. 

Max.  50-yr.,  7-day  vol.  (s.f.d.) 

35. 

A/2 

Min.  2-yr.,  7-day  vol.  (s.f.d.) 

36. 

M,„ 

Min.  10-yr.,  7-day  voL  (s.f.d.) 

37. 

M20 

Min.  20-yr.,  7-day  vol.  (s.f.d.) 

'  Standard  deviation  of  mean. 


plete  description  of  each  would  be  too  lenghty  for 
this  paper.  Brief  mathematical  notation  is  presented 
for  clarity  or  to  extend  basic  concepts. 

Regression 

In  the  introduction  it  was  stated  that,  in  general, 
most  hydrologic  research  is  oriented  toward  de- 


veloping a  predictive  equation  for  the  dependent 
variable.  The  most  stable  equation  is  probably 
deterministic  or  rational  in  design.  The  Stanford 
watershed  model  and  the  many  versions  that  are 
in  use  are  examples  of  such  predictive  equations. 
Quite  often,  either  the  data  for  such  models  are  not 
available  or  the  use  to  be  made  of  the  predictions 
does  not  justify  the  time  and  expense  required  to 
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fit  such  a  model  to  the  data.  An  example  of  such  a 
need  might  be  predictive  equations  for  moments  of 
the  distribution  of  mean  annual  or  monthly  flows. 
For  uses  such  as  these,  more  easily  fitted  and 
simpler  multiple  linear  regression  models  are  de- 
sired. In  the  following  paragraphs,  some  of  the 
techniques,  problems,  and  fallacies  of  developing 
the  simpler  models  are  discussed. 

Probably  the  first  questions  to  arise  in  the  devel- 
opment of  linear  models  are:  How  many  variables 
should  be  correlated  with  the  dependent  variable, 
and  how  should  they  be  selected?  The  easiest 
answer  to  these  questions  is  to  include  all  variables 
and  run  a  multiple  linear  regression  on  them.  This 
is  a  poor  solution  to  the  problem  because  variable 
interaction  can  lead  to  very  unrealistic  coefficients. 
To  illustrate  this,  consider  the  prediction  of  mean 
annual  flow  at  the  90  stations  in  Oklahoma  based 
on  the  12  watershed  characteristics  given  in  table  1. 
The  coefficients,  R^,  and  standard  error  of  estimate, 
SEE,  of  this  equation  are  presented  as  equation  1 
in  table  2.  The  coefficients  oiE  and/ioo  are  negative, 
whereas  in  reality  they  should  be  positive. 

A  better  technique  is  to  use  a  form  of  stepwise 
variable  selection.  In  this  method,  the  potential 
increase  in  the  regression  sums  of  squares  for  each 
variable  is  calculated,  the  variable  that  contributes 
most  to  the  regression  sums  of  squares  is  selected, 
and  a  multiple  regression  is  calculated  using  this 
variable  and  all  previously  selected  variables.  The 
procedure  is  then  repeated  for  all  remaining  vari- 
ables until  the  contribution  of  the  most  significant 
variable  remaining  is  no  longer  significant  as  indi- 
cated by  an  F  test.  At  this  point,  the  regression  equa- 
tion is  determined  on  the  basis  of  the  variables 
selected.  Equation  2  in  table  2  is  an  equation  pre- 
dicting mean  annual  flow  using  the  stepwise  selec- 
tion technique.  Only  four  of  the  12  variables  were 
selected,  yet  the  standard  error  of  estimate  is  less 
than  with  all  12  variables.  The  standard  error  is 
smaller  even  though  the  is  smaller  because  of 
the  change  in  degrees  of  freedom  used  in  the  anal- 
ysis. A  5-percent  level  of  significance  was  used  in 
the  selection. 

At  least  two  other  versions  of  this  technique  exist. 
One  method  adds  variables  to  the  regression  equa- 
tion in  the  above  manner,  but  after  each  addition 
a  test  is  performed  on  the  significance  of  all  pre- 
viously selected  variables  in  the  presence  of  the 


most  recently  selected  variable.  This  is  carried  out 
because  variable  interaction  may  cause  a  pre- 
viously selected  variable  to  be  insignificant  in  the 
presence  of  other  variables.  Diff^erent  levels  of 
significance  are  used  for  additions  to  and  dele- 
tions from  the  regression  equation  to  prevent 
cycling  of  variables  into  and  out  of  the  equation.  A 
second  variation  in  this  technique  assumes  all 
variables  are  in  the  equation  initially,  and,  again 
using  an  F  test,  the  most  insignificant  variable  is 
removed.  As  in  the  other  versions,  the  procedure 
continues  until  all  remaining  variables  are  sig- 
nificant. These  techniques  have  the  advantage  of 
retaining  most  of  the  correlation  that  exists  between 
the  dependent  and  independent  variables  with  a 
minimum  number  of  variables. 

A  diff^erent  series  of  solutions  can  be  obtained 
using  factor  and  components  analyses.  Consider 
first  a  factor  analysis  of  the  12  independent  variables 
followed  by  a  varimax  rotation  of  the  vectors.  These 
vectors  are  linear  combinations  of  all  variables. 
Such  an  analysis  will  no  doubt  show  that  most  of 
the  variance  in  the  system  caused  by  the  12  inde- 
pendent variables  can  be  explained  by  q  independ- 
ent and  orthogonal  factors  where  q  is  less  than  12. 
Regressing  these  q  vectors  on  the  dependent  vari- 
able will  produce  a  regression  equation  similar  to 
equation  1  in  that  all  independent  variables  will 
be  involved.  It  does,  however,  have  the  advantage 
that  the  coefficients  will  be  more  stable  from  one 
data  set  to  another  than  will  those  of  the  simple 
multiple  regression.  This  is  because  only  the  sig- 
nificant variance  in  the  data  matrix  is  considered. 

Consider  again  the  example  of  estimating  the 
mean  annual  flow  from  the  12  watershed  character- 
istics. Shown  in  table  3  are  the  12  rotated  eigen- 
vectors and  their  corresponding  eigenvalues.  Only 
values  greater  than  0.5  are  shown.  There  are  only 
about  six  significant  orthogonal  components  in 
this  matrix;  these  account  for  about  94  percent  of 
the  variance  in  tire  data  matrix.  These  six  niajor 
eigenvectors  regressed  on  mean  annual  flow 
produced  equation  3  in  table  2.  Notice  that  the 
coefficients  of  most  variables  are  of  the  right  sign 
as  compared  to  multiple  regression,  equation  1. 
See  Kendall  (0,  pp.  71-75)  for  another  example  of 
this  approach. 

Several  variations  of  this  technique  exist.  Suppose 
in  the  previi>us  example  that  instead  oi  selecting 
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the  largest  and  most  significant  roots  to  regress  on 
the  dependent  variable,  the  correlation  between 
each  root  and  the  dependent  variable  (mean  annual 
flow)  is  calculated  and  only  the  vectors  with  signifi- 
cant roots  in  this  sense  are  regressed  on  the  de- 
pendent variable.  Since  each  of  the  roots  is  inde- 
pendent, the  correlations  are  additive.  It  is  therefore 
easy  to  apply  an  F  test  to  the  significance  of  each 
of  the  correlations.  Table  4  shows  the  correlation 
between  each  of  the  12  vectors  in  table  3  and  the 
mean  annual  flow.  If  these  correlations  are  ranked 
and  an  F  test  made  on  the  ratio  of  the  correlation 
contributed  by  a  vector  to  the  yet  unexplained 
correlation,  the  significant  vectors  may  be  deter- 
mined. These  are  starred  in  the  table.  The  regres- 
sion equation  based  on  these  vectors  is  number  4 
in  table  2.  This  equation  has  a  low  standard  error 
of  estimate,  but  some  of  the  coefficients  have  il- 
logical signs.  Illogical  signs  are  Hkely  to  develop 
even  in  this  technique  when  a  large  number  of 
variables,  some  of  which  may  have  no  relation  to  the 
dependent  variable,  are  included. 

In  general,  it  may  be  assumed  that  if  two  or  more 
variables  in  any  rotated  eigenvector  are  highly 
loaded,  then  there  is  a  high  degree  of  correlation 
between  these  variables.  Under  such  circumstances, 
only  one  of  the  variables  need  be  selected  from  each 
vector  as  representative  of  all  the  highly  loaded 
variables.  Therefore  only  one  variable  need  be 
selected  from  each  significant  vector.  This  method 
of  variable  selection  may  be  applied  to  both  of  the 
multivariate  techniques  just  described;  that  is, 
those  used  to  get  equations  3  and  4.  In  the  first  six 
vectors  the  variables  selected  by  this  means  are 
/loo.  A,  O,  Si,  A-i  and  S.  The  variables  selected 
from  the  rotated  vectors  presented  in  table  3  are 
underlined.  A  multiple  regression  equation  on  the 
variables  from  the  first  six  vectors  is  shown  as  num- 
ber 5  in  table  2.  To  select  the  variables  from  the 
vectors  most  highly  correlated  with  the  dependent 
variable  (table  4),  the  significant  vectors  must  be 
varimax  rotated  to  emphasize  the  strongest  vari- 
ables. Table  5  shows  the  significant  vectors  of 
table  4  rotated  with  the  prominent  variables  under- 
Lined.  Only  loadings  greater  than  0.5,  or  the  maxi- 
mum value  in  the  vector  are  shown.  A  regression 
equation  on  these  seven  variables  — (S,  A,  L,  I2,  0, 
Ep,  and  F)  —  \s  presented  as  equation  6  in  table  2. 
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Table  3.  —  Varimax  rotated  factor  matrix  of  12  watershed  characteristics  ' 
[Blanks  in  culumns  indicate  values  less  than  0.5] 


Eigenvalue  and  vector 


Variable 

1 

5.60 

2 
1.73 

3 
1.49 

4 

1.02 

5 
0.82 

6 
0.58 

7 

0.34 

8 
0.20 

9 
0.13 

10 
0.05 

11 

0.03 

12 
0.01 

A  

S  

L  

Ai   

E  

F  

P  

h  

/lOO  

Er  

0  

Si  

0.91 
-.58 
-.82 
-.91 
-.93 

-0.% 
-.78 

0.97 

0.94 

0.98 

096 

-0.85 

0.69 

'  The  12  watershed  characteristics  were  fiiven  on  pafie  .50. 


This  equation  is  considerably  better  than  number 
5  which  was  based  on  only  the  most  significant 
roots  of  the  correlation  matrix.  This  shows  that 
sometimes  a  large  part  of  the  variance  in  a  data 
matrix  may  not  be  associated  with  the  dependent 
variable.  The  poorer  the  correlation  between  the 
dependent  and  independent  variables,  the  more 
likely  that  this  may  happen.  In  the  example,  the 
fourth  and  fifth  roots  were  not  at  all  associated 
with  the  dependent  variable. 

It  is  possible  to  modify  this  procedure  slightly  and 
include  the  dependent  variable,  Q„,  as  the  last 
variable  in  the  factor  analysis  and  varimax  rotation. 
The  loading  on  this  variable  in  each  of  the  vectors 
will  show  the  vectors  with  which  it  is  most  closely 
allied.  Table  6  shows  the  results  of  using  this 
technique.  Only  values  greater  than  0.5  are  shown 
except  for  vector  11  and  the  loadings  on  the  de- 
pendent variable.  The  most  significant  variable  in 
each  vector  is  underlined.  The  high  loading  in 
vector  2  of  both  the  dependent  variable  Q„  and 
drainage  area  A  shows  that  most  of  the  explainable 
variation  in  Q„  can  be  explained  by  the  variance 
in  A.  The  vectors  that  are  marked  with  an  asterisk 
are  those  assumed  to  be  highly  enough  loaded  on 
the  dependent  variable  to  be  considered.  The 
variables  from  these  vectors  — (/i,  S,  £, .  and  L)  — 
were  used  in  the  regression  equation  shown  as 


number  7  in  table  2.  Notice  that  all  three  methods  of 
selection  lead  to  different  variables. 

This  technique  can  be  modified.  Consider  the 
variables  selected  by  the  last  three  methods  of 
analysis.  Even  though  they  are  as  independent  a  set 
as  possible,  a  certain  degree  of  intercorrelation 
exists.  It  is  quite  possible,  therefore,  that  the  coeffi- 


Table  4.  —  Correlation  between  the  eigenvectors  of 
the  watershed  characteristics  and  mean  annual 
flow 


Vector  No. 

Eigenvalue 

Correlation 

1  

5.60 
1.73 
1.49 
1.02 
.82 
.58 
.34 
.20 
.13 
.05 
.03 
.01 

•0.0426 
•.5018 
•  0286 
.0002 
.0001 
•.1494 
•0153 
•.0141 
•1155 
.0000 
.0000 
.0005 

2  

3  

4  

5  

6  

7  

8  

9  

10  

11  

12  

Totals  

12.00 

.8686 

•Sijinificant  correlation  .S  perceni. 
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Table  5.— Significant  eigenvectors  of  watershed  characteristics  varimax  rotated 


Variable 

Vector  and  correlation 

9 

5 
0  15 

9 
0.12 

1 

0.04 

3 
0.03 

7 
0.02 

8 
0.01 

0.96 

c 

U.oZ 

f 

.79 

0.36 

Ai  

r 

0.89 

r 

0.76 

p 

-.83 

1 2  

-.90 

-.89 

E.  

.55 

-0.55 

0  

0.92 

Si  

.64 

cients  of  such  equations  may  not  be  rational.  There- 
fore, a  better  method  might  be  to  regress  the  depend- 
ent variable,  Q„.  on  the  significant  roots  of  these 
variables  rather  than  the  variables  themselves. 
The  significant  roots  of  these  variables  would  be 
determined  by  the  same  technique  as  that  used  to 
select  the  significant  roots  of  all  the  variables  (table 
4).  Regression  equations  based  on  this  technique 
are  presented  as  numbers  8,  9,  and  10  in  table  2. 
Note  that  equations  9  and  10  are  identical  with  equa- 
tions 6  and  7,  respectively.  This  is  because  all  roots 
of  the  variables  used  in  these  equations  were 
significantly  correlated  with()rt.  The  slight  difference 
between  equations  5  and  8  resulted  from  using  only 
the  first  five  of  the  six  roots.  Ordinarily,  a  fewer 
number  of  roots  than  in  these  three  examples  are 
found  to  be  significantly  correlated  with  the  depend- 
ent variable,  and  a  greater  difference  between 
the  equations  is  obtained. 

Since  drainage  area  and  precipitation  are  general- 
ly highly  correlated  with  mean  annual  flow,  any 
regression  equation  that  includes  these  variables 
will  not  have  very  many  additional  significant 
variables.  To  reduce  the  effect  of  these  variables  in 
the  regression  equation,  the  mean  annual  runoff  was 
divided  by  the  product  of  drainage  area  and  rainfall 
as  described  in  the  regional  analysis.  Results  of  a 
stepwise  selection  of  variables  using  the  adjusted 


values  of  mean  annual  flow  and  the  12  watershed 
characteristics  are  shown  as  equation  11  in  table  2. 

Note  the  significant  improvement  in  the  standard 
error  of  estimate  using  the  adjusted  data.  The  equa- 
tion was  rescaled  to  the  raw  data  before  calculating 
the  and  standard  error.  In  addition  to  the  im- 
provement in  the  standard  error  of  estimate,  a 
technique  such  as  this  should  improve  the  stability 
of  the  regression  equation  when  applied  to  other 
data.  Further  investigation  of  this  approach  should 
be  made,  especially  considering  the  spurious 
correlation  that  may  result. 

Each  of  the  equations  in  table  2  was  based  on  a 
procedure  that  was  designed  to  produce  the  best 
possible  set  of  coefficients.  It  can  be  seen  that  a 
great  variety  of  different  combinations  have  been 
obtained.  However,  one  thing  stands  out:  There  is 
very  little  variation  in  the  standard  error  of  estimate 
and  square  of  the  multiple  correlation  coefficient. 
This  would  indicate,  considering  the  fact  that  we 
are  in  general  dealing  with  a  nonrational  model, 
that  any  one  of  these  methods  is  probably  as  good  as 
the  next,  although  the  technique  used  to  get  equa- 
tions 8,  9,  and  10  is  probably  the  best.  There  are 
slight  advantages  and  disadvantages  to  both  the 
common  regression  techniques  and  the  multi- 
variate techniques. 

Consider   first   the   common   linear  regression 
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models.  In  general,  these  do  not  reflect  a  great 
deal  of  stability  when  extrapolated  to  other  data  sets. 
This  instability  is  the  result  of  variable  interaction. 
If  the  variables  are  relatively  independent,  the  equa- 
tions should  be  satisfactory  when  used  on  other  data 
sets.  However,  the  regression  methods  have  an 
advantage  in  that  it  is  possible  to  calculate  standard 
errors  of  the  coefficients  and  confidence  intervals 
for  the  mean  and  individual  observations  (4,  pp. 
179-183). 

Equations  developed  by  multivariate  procedures 
will  tend  to  show  more  stability  than  the  common 
regression  methods  when  applied  to  new  data  sets 
because  the  coefficients  are  fitted  on  the  basis  of 
i  only  the  statistically  significant  orthogonal  com- 
ponents. They  have,  however,  a  very  definite  dis- 
advantage in  that  no  methods  known  to  the  authors 
exist  to  calculate  standard  errors  of  the  coefficients 
or  to  place  confidence  limits  on  the  mean  and 
individual  observations. 

The  multivariate  approach  is  based  on  an  equa- 
tion of  the  form: 

in  which  a  is  a  coefficient;  ^  is  a  component,  that  is, 
a  linear  combination  of  the  independent  variables; 
and  e  is  the  random  error.  It  is  possible  to  calculate 
standard  errors  for  the  aj.  but  these  cannot  be 
related  back  to  the  variable  unless  there  are  equal 
number  of  aj  and  independent  variables.  If  all  the 
roots  are  used  in  a  components  regression  such  as 


this,  then  there  will  be  an  equal  number  of  aj  and 
independent  variables.  However,  in  such  a  solution, 
e,  which  is  random  error,  will  be  assigned  to  the 
regression  residuals,  and  the  solution  will  be  iden- 
tical with  a  normal  multiple  regression. 

Discriminant  Analysis 

The  hydrologic  design  of  a  structure  is  generally 
based  upon  estimates  of  the  hydraulic  and  hydro- 
logic  characteristics  of  the  river  or  stream  on  which 
it  is  to  be  built.  The  best  estimates  of  these  charac- 
teristics are  based  upon  or  generated  from  records 
of  the  observed  phenomena.  Except  for  a  relatively 
few  large  structures,  most  water  resources  projects 
are  not  located  at  streamflow  measuring  sites.  If. 
however,  there  are  stations  either  upstream  or 
downstream  from  the  site  in  question,  these  records 
may  be  adjusted  on  the  basis  of  drainage  area  or 
other  watershed  characteristics  to  apply  to  the  area 
studied.  If  no  records  are  available,  then  the  hydro- 
logic  characteristics  must  be  predicted  from  other 
watersheds. 

In  the  previous  section,  some  of  the  methods  for 
developing  prediction  equations  of  hydrologic 
response  were  presented.  At  this  point  the  question 
arises  "For  how  large  an  area  in  the  field  would  this 
description  be  accurate?"  Wallis  (11)  presented  a 
technique  in  which  factor  and  discriminant  function 
analyses  are  used  to  evaluate  the  likelihood  of  good 
results  from  a  prediction  equation  when  applied  to 
data  other  than  that  used  in  the  development  of  the 


Table  6.— Results  of  factor  analysis  and  varimax  rotation  of  uatershed  characteristics 

and  mean  annual  flow 


Variable 


Vector 


10 


11 


A.... 
S.... 
L... 
A,... 

E.  .. 

F.  .. 
P.. 
h... 

/lOO 

Si... 

Qa.. 


-0.93 


-.62 


-0.95 


-0.57 


-0.98 


0.93 
-.62 
-.85 
-.94 
-.95 


.02 


0.67 


'-.98 


0.97 
.01 


-0.94 
-.04 


-.01 


.10 


-0.84 


'  .05 


'  .08 


-.00 


0.00 


0.17 


.14 


'  Designates  the  most  signifirant  vectors  as  indicated  by  the  loading  on 


54 


MISCELLANEOUS  PUBLICATION  NO.  1275,  U.S.  DEPARTMENT  OF  AGRICULTURE 


equation.  In  hydrologic  investigations,  the  problem 
is  basically  that  of  regionalization. 

Most  techniques  used  to  determine  the  applicable 
region  for  prediction  equations  are  rather  subjec- 
tive. For  example,  in  extrapolating  hydrologic  data 
from  gaged  to  ungaged  watersheds,  Benson  and 
Matalas  (1)  and  Thomas  and  Benson  {10)  correlated 
various  watershed  characteristics  of  large  drainage 
basins  with  hydrologic  characteristics  of  the  stations 
within  these  basins.  The  Water  Resources  Division 
of  the  U.S.  Department  of  Interior,  Geological 
Survey,  is  also  developing  similar  correlations  for  all 
stations  within  each  State  {2).  Under  these  circum- 
stances, the  boundaries  are  governmental  and  in 
general  bear  little  resemblance  to  boundaries  of 
similar  hydrologic  regions.  The  U.S.  Department  of 
Agriculture  (9)  has  subdivided  the  country  into  land 
resource  regions  and  areas  of  similar  soils  and  geol- 
ogy. In  general,  hydrologic  data  are  assumed  to  be 
similar  within  these  areas.  Thus,  in  most  cases, 
the  boundaries  have  been  artificial  and  not  hydrolog- 
ically  accurate. 

This  section  of  the  paper  presents  a  technique 
that  may  be  useful  in  determining  regions  of  similar 
hydrologic  response  that  may  or  may  not  be  geo- 
graphically oriented.  The  method  is  based  upon  a 
cluster  analysis  of  discriminant  scores. 

Suppose  it  was  desirable  to  predict  the  37  hydro- 
logic  or  response  variables  in  table  1  from  the  12 
watershed  characteristics.  Many  of  these  response 
variables  are  similar  and  could,  therefore,  be 
expected  to  be  applicable  over  the  same  general 
area.  To  determine  how  many  different  sets  of  data 
are  represented  by  the  37  variables,  a  factor  analysis 
followed  by  varimax  rotation  was  performed  on  the 
correlation  matrix  of  these  data.  Results  of  this 
analysis  showed  that  five  sets  of  data  were  present. 
They  were:  (1)  The  means  and  standard  deviations  of 
the  mean  annual  flow  of  the  winter  and  spring 
months  (November  through  May),  (2)  the  means  and 
standard  deviations  of  the  summer  and  fall  months 
(June  through  October),  (3)  the  five  peak  flow  rates, 
(4)  the  three  maximum  flow  volumes,  and  (5)  the 
three  minimum  flow  volumes.  In  all  cases,  the  means 
and  standard  deviations  of  the  monthly  flows  always 
remained  in  pairs.  The  split  between  winter-spring 
and  summer-fall  months  is  probably  associated 
with  the  more  or  less  random  occurrence  of  summer- 


fall  events  as  opposed  to  the  rather  stable  winter- 
spring  flows.  The  five  groupings  were  very  clean 
with  the  exception  of  the  peak  flow  rate  for  a  50- 
year  return  period,  the  maximum  7-day  flow  for  a 
50-year  return  period,  and  the  minimum  flow  rates. 
For  a  large  number  of  watersheds,  the  period  of 
record  was  not  long  enough  to  estimate  some  of  the 
response  variables.  Thus,  these  variables  were 
dropped  from  consideration  in  future  analyses. 

The  factor  analysis  showed,  for  example,  that  the 
means  and  standard  deviations  of  the  five  summer 
months  tended  to  represent  one  group  of  data.  If 
prediction  equations  for  these  variables  are  desired, 
it  would  be  logical  to  assume  that  the  best  prediction 
equations  should  be  based  on  data  from  areas  in 
which  the  variation  in  these  variables  is  as  small  as 
possible.  The  data  set  of  90  observations  should 
therefore  be  split  into  two  mutually  exclusive  sets 
such  that  each  set  of  data  has  a  minimum  variance. 
This  may  be  accomplished  by  an  iterative  scheme 
in  which  an  initial  hypothetical  split  is  assumed. 
Linear  combinations  of  the  10  variables  (means  and 
standard  deviations  of  the  5  months)  are  calculated 
such  that  the  ratio  of  the  between-groups  sums  of 
squares  to  the  among-groups  sums  of  squares  is  a 
maximum.  The  value  of  this  linear  combination,  the 
discriminant  score,  for  each  of  the  observations  can 
be  plotted  to  show  separation  of  the  data.  Unless 
one  is  extremely  lucky,  it  is  not  likely  that  he  will 
find  optimum  separation  of  the  data  on  the  first 
estimate,  and  a  large  area  of  overlap  between  the 
two  groups  will  exist.  The  observations  in  this  area  of 
overlap  are  assigned  to  the  group  they  are  more 
nearly  like  based  on  the  discriminant  scores.  At 
this  point,  a  new  grouping  of  the  data  is  obtained, 
and  the  procedure  is  repeated.  This  iterative  pro- 
cedure, a  type  of  cluster  analysis,  wiU  tend  to 
minimize  the  within-group  variance. 

This  method,  used  to  get  the  coefficients  for  the 
variables  and  to  assign  the  observations,  is  the 
same  as  multiple  discriminant  and  classification 
analyses.  References  {3,  5,  7,  and  8)  have  good  de- 
scriptions of  these  techniques.  Standard  subroutines 
such  as  those  in  IBM's  scientific  subroutine  package 
can  be  easily  modified  to  perform  the  iterative 
scheme  described  above.  The  technique  is  general 
enough  that  it  can  be  applied  to  more  than  two 
groups  at  a  time.  As  of  the  writing  of  this  manu- 
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script,  it  was  not  known  how  stable  convergenc  e  of 
the  algorithm  is,  especially  when  applied  to  more 
than  two  groups.  If  the  technique  appears  to  be 
unstable,  several  possibilities  exist  to  improve  its 
stabiUty.  The  most  stable  condition  appears  to  be 
the  one  described  above,  that  is,  consideration  of 
only  two  groups.  Additional  subdivision  could  be 
obtained  by  further  subdividing  each  group  into 
two  groups  and  so  on.  In  the  discussion  of  two 
groups,  it  was  stated  that  a  score  could  be  obtained 
for  each  observation.  In  the  multigroup  extension, 
more  than  one  score  can  be  obtained.  In  fact,  the 
number  of  scores  will  be  either  (a)  one  less  than 
the  number  of  groups,  or  (b)  the  number  of  variables, 
whichever  is  the  smallest.  It  is  likely,  however, 
that  not  all  of  these  scores  will  be  significant.  By 
using  only  the  significant  scores  in  assigning  the 
observation  to  a  group,  the  stability  of  the  system 
should  be  improved.  StabiUty  might  also  be  im- 
proved by  not  reassigning  all  watersheds  whose 
scores  faU  in  the  area  of  group  overlap.  For  example, 
maybe  only  those  watersheds  with  a  score  indicating 
a  probability  of  the  other  group  membership  greater 
than  0.8  should  be  reassigned. 

Considering  the  example  of  the  means  and  stand- 
ard deviations  of  the  five  summer-fall  months  and 
the  possibility  of  dividing  the  data  from  the  State 
into  two  groups  based  on  these  characteristics,  it 
was  decided  that  since  the  flow  rate  is  so  heavily 
dependent  upon  both  drainage  area  and  annual 
rainfall,  the  data  might  group  itself  about  these 
two  parameters.  The  data  were  therefore  adjusted 
by  dividing  the  hydrologic  variables  by  the  product 
of  annual  basin  rainfall  and  drainage  area  as 
described  in  analysis  of  the  last  regression  equation 
in  table  2.  Mean  monthly  runoff  becomes  cubic  feet 
per  second  per  square  mile  inch  of  rainfall.  Differ- 
ences in  this  figure  from  one  watershed  to  another 
would  thus  primarily  reflect  differences  in  geology 
rather  than  drainage  area  and  rainfall. 

The  90  runoff  measuring  stations  were  arbitrarily 
assigned  to  one  of  the  two  groups,  and  the  iterative 
procedure  described  above  was  used  on  the  data. 
Figure  1  shows  the  group  separation  for  the  initial 
group  assignment  based  on  scores  for  the  two  gn>u[)s. 
The  plots  are  also  based  on  an  assumption  that  the 
scores  are  normally  distributed;  tests  have  shown 
this  to  be  a  reasonable  assumption  (3,  5).  At  con- 


vergence, when  no  new  reassignment  was  made, 
discrimination  between  the  groups  was  highly 
significant,  as  shown  on  figure  2.  The  geographical 
distribution  of  the  two  groups  is  shown  on  the  State 
map  cjf  figure  3.  In  general,  the  western  one-third  of 
the  State  appears  to  be  in  one  group,  and  the  east- 
ern two-thirds  in  the  other,  with  the  exception  that 
the  large  river  systems  — Arkansas,  Canadian's 
Cimmaron,  Washita,  and  Red  — appear  to  retain  the 
characteristics  of  the  area  they  drain  for  some  dis- 
tance downstream.  Predictive  equations  developed 
specifically  for  each  of  these  areas  would  certainly 
be  much  better  than  those  developed  for  the  whole 
area. 

Cluster  analyses,  as  previously  described,  were 
also  run  on  the  remaining  four  groups  of  similar 
hydrologic  variables.  The  results  of  these  analyses 
are  shown  on  figure  4.  The  statistical  term  D- 
shown  on  each  figure  is  Mahalanobis'  D^,  and  index 
of  the  degree  of  group  separation.  The  D-  values 
ranging  from  144  to  363  indicate  good  group  separa- 
tion. The  State  map  in  the  upper  left  part  of  figure 
4  shows  the  subdivision  of  the  State  based  on  the 
flow  means  and  standard  deviations  of  the  winter- 
spring  months  and  the  annual  series.  The  lower  left 
part  shows  the  solution  for  the  summer-fall  months 
(previously  described).  The  upper  right  part  shows 
the  solution  for  maximum  volumes;  and  the  lower 
right  part,  the  solution  for  peak  flow  rates.  All  of  the 
maps,  except  that  for  peak  rates,  are  quite  similar. 
This  would  indicate  that  the  same  areal  subdivision 
could  probably  be  used  for  all  three  groups. 

The  solution  for  peak  rates  of  flow,  shown  in  the 
lower  right-hand  plot  on  figure  4,  does  not  appear 
to  have  any  geographical  significance.  However, 
it  may  still  be  possible  to  make  use  of  this  group 
separation  to  obtain  maximum  predictive  utility. 
Since  the  separation  obtained  in  this  solution, 
D'~  —  144,  is  fairly  high  with  respect  to  the  response 
variables  (peak  flow  rates  for  2-  to  25-year  return 
periods),  there  is  probably  a  combination  of  water- 
shed variables  that  this  group  separation  niav  be 
attributed  to.  A  discriminant  analysis  of  the  12 
watershed  characteristics,  based  on  the  abo\e 
group  separation,  shows  that  the  most  significant 
variables  are  the  rainfall  intensities  for  2-  and  ]00- 
year  return  periods,  the  average  watershed  slope, 
and  the  average  rainfall  on  the  watershed.  If  the 
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variables  that  contribute  most  to  peak  flow  rates 
per  square  mile  inch  of  rainfall  were  rationally 
selected,  rainfall  intensity  and  watershed  slope 
would  probably  be  selected  as  the  most  significant. 
Thus  the  discriminant  analysis  leads  to  a  logical 
conclusion.  The  discriminant  equation  that  can  be 
used  to  differentiate  between  the  two  groups  is: 

D,  =  0.041  S  +  0.019  P  -  0.90 1 2  +  0.44  /,oo. 

A  discriminant  score  greater  than  1.63  would  indi- 
cate that  the  peak  prediction  equations  for  group  2 
should  be  used.  Correspondingly,  a  value  less  than 
1.63  would  indicate  that  the  peak  prediction  equa- 
tions for  group  1  should  be  used. 

As  a  further  example  of  the  use  of  cluster  analysis 
in  regionalization,  assume  that  the  mean  annual 
flow,  Qa-,  and  its  standard  deviation,  S«,  along  with 
the  annual  maximum  flow  rate  approximated  for  this 
example  by  P,  were  needed  to  generate  a  synthetic 


sequence  of  data  at  an  ungaged  location.  Estimates 
of  these  three  variables  will  therefore  be  necessary, 
and  it  would  be  desirable  to  know  which  region  the 
ungaged  area  is  most  similar  to  and  then  predict 
these  variables  from  equations  based  on  data  from 
this  region.  By  using  the  three  variables  as  dis- 
criminators in  a  cluster  analysis,  the  different 
regions  may  be  identified.  The  upper  drawing  in 
figure  5  shows  the  two  regions  that  developed  from 
this  analysis.  Figures  6  and  7  show  the  distribution 
of  discriminant  scores  for  the  first  estimate  of  the 
subdivision  and  upon  convergence.  The  first  esti- 
mate in  this  case  was  an  assignment  of  the  first  45 
watersheds  into  one  group,  and  the  second  45  into 
the  other  group. 

To  investigate  the  multigroup  extension  of  this 
concept,  each  of  the  two  groups  above  were  further 
divided.  The  four  groups  that  resulted  from  this 
process  were  then  used  as  the  initial  estimate  for 
a  four-group  cluster  analysis.  A  very  logical  subdivi- 
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Figure  2. —  Distribution  of  discriminant  scores  for  two  regions  of  Oklahoma  (at  convergence). 


Figure  3.  —  Geographical  (iistributioii  of  watersheds  (two  groups). 
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Figure  4.  —  Geographical  distribution  of  watershed 


sion  resulted  as  shown  on  the  lower  drawing  in 
figure  5.  Figure  8  shows  the  extreme  group  separa- 
tion that  resulted  from  this  analysis. 

One  of  the  measures  of  the  effectiveness  of  any 
regionalization  technique  is  the  improvement  in 
predictive  ability  obtained  by  breaking  the  area 
into  smaller,  more  homogeneous  areas.  Take  for 
example,  prediction  of  the  mean,  standard  deviation, 
and  annual  peak  rate  of  runoff  previously  discussed. 
The  first  subdivision  of  the  State  on  the  basis  of 
these  three  response  variables  shows  47  watersheds 
in  one  group  and  43  in  the  other.  Consider  a  linear 
regression  equation  for  predicting  the  mean 
annual  flow.  If  all  90  stations  are  used  to  derive  this 
equation,  a  stepwise  selection  of  variables  with  5- 
and  10-percent  levels  of  significance  to  accept  and 
reject  variables,  respectively,  yields  the  following 
equation: 

Q„  =  0.24  /4  -  58  S  -  3.0  L  +  36P  -  273. 

The  equation  has  a  standard  error  of  estimate  of 
910  with  an  i?^  of  0.857.  When  this  equation  is  used 
to  predict  the  mean  annual  flow  for  the  43  water- 
sheds in  the  western  part  of  the  State,  the  standard 
error  of  estimate  is  1044  with  an  of  0.904.  If  used 
to  predict  the  mean  annual  flow  for  the  47  eastern 
watersheds,  the  standard  error  of  estimate  is  832 
and  the  is  0.516.  To  see  if  subdivision  of  the 
State  into  the  two  groups  improves  the  prediction  of 
mean  annual  flow  rate,  a  stepwise  selection  of 


wo  groups)  for  four  different  sets  of  discriminators. 

variables  based  on  the  43  western  watersheds  yields 
the  following  equation: 

=  0.25  ^-3.9  L  +  247. 

The  standard  error  of  estimate  is  1009  with  the 
/?2  equal  to  0.905.  Calculated  in  a  similar  manner, 
the  equation  based  on  the  47  eastern  watersheds  is: 

^„  =  0.61  ^  +  52P-12  £,-2901 

with  the  standard  error  180  and  the  R'^  equal  to 
0.977. 

Thus  subdivision  of  the  State  gives  significantly 
better  predictive  equations  and  is  more  logical  than 
grouping  all  into  one  equation.  The  standard  error  of 
estimate  of  the  western  43  watersheds  is  only  slightly 
improved,  but  the  standard  error  for  the  47  eastern 
watersheds  is  less  than  one-fourth  of  that  from  the 
equation  developed  from  90  observations.  The 
improvement  in  predictive  ability  is  also  shown  in 
figures  9  and  10  for  the  43  western  watersheds  and 
47  eastern  watersheds,  respectively.  No  comparison 
was  made  on  the  basis  of  four  regions,  although  it 
should  show  even  more  improvement. 

The  improvement  in  prediction  is  not  the  only 
justification  of  regionalization  using  discriminant 
analysis.  One  other  point  is  that  when  a  number  of 
watersheds  are  used  to  develop  prediction  equations 
for  use  on  ungaged  watersheds  nearby,  we  will  need 
to  know  if  the  variable  being  predicted  is  regionally 
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or  geographically  distributed  or  not,  or  if  the 
distribution  is  regional  but  not  in  a  geographical 
sense.  For  example,  it  is  possible  that  an  ungaged 
watershed  can  be  associated  with  an  area,  but  there 
may  be  no  data  available  to  associate  it  with  the 
area  any  other  way  than  by  discriminant  analysis. 


Even  though  there  is  no  proof  that  the  iterative 
scheme  described  and  illustrated  above  will  yield 
the  optimum  solution  to  the  problem  of  group 
identification,  it  is  interesting  that  on  every  occa- 
sion, except  one,  in  which  only  two  groups  were 
considered  at  a  time,  extremely  different  starling 


Figure  5.  —  Ceographical  distribution  «t  watershed  (two  and  four  proupsl 
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Figure  6.  —  Distribution  of  discriminant  scores  for  two  regions  of  Olclahoma  (hrst  estimate). 


points  always  converged  to  the  same  solution. 
However,  in  not  all  cases  was  there  a  single  unique 
solution.  In  one  case,  the  result  oscillated  through 
three  solutions  in  which  three  or  four  watersheds 
were  rotated  between  the  two  groups.  This  happened 
when  group  separation  was  not  very  significant.  It 
was  also  observed  that  unique  solution  was  some- 
times attained  with  group  separation  significantly 
lower,  as  indicated  by  the  D'^,  than  at  previous  itera- 
tive points  in  the  cycle.  However,  even  in  spite  of 
these  objections,  the  results  of  these  tests  indicate 
that  this  technique  using  discriminant  analysis 
may  be  a  powerful  tool  in  the  regionalization  of 
hydrologic  data.  It  is  hoped  that  additional  investi- 
gation wiU  answer  some  of  the  questions  raised. 

Canonical  Regression 

Up  to  this  point,  the  discussion  has  been  centered 
on  developing  a  simple  linear  model  for  the  predic- 
tion of  various  hydrologic  characteristics;  mean 


annual  flow  was  used  in  the  previous  examples.  In 
the  last  several  years,  enough  data  have  been  col- 
lected that  fairly  good  watershed  models  have 
been  developed.  These  models  are  generally  sto- 
chastic, with  varying  degrees  of  deterministic  and 
probabihstic  elements.  The  Stanford  watershed 
model,  for  example,  is  primarily  deterministic  with 
very  little  probabilistic  element.  In  contrast  to  this, 
many  of  the  multiple  station  flow  models  are  based 
upon  statistical  flow  characteristics  and  have  a 
large  probabilistic  element.  Such  models  may  have 
many  parameters  that  must  be  estimated  if  they  are 
to  be  applied  to  unmeasured  watersheds.  Consider 
for  example  the  mean,  standard  deviation,  and  skew 
parameters  defining  the  distribution  of  rainfall. 
Suppose  these  three  parameters  were  not  available 
at  a  certain  location  and  had  to  be  predicted  from 
other  locations.  The  technique  being  used  most 
frequently  to  estimate  these  parameters  is  the  step- 
wise method  of  relating  each  of  the  parameters  to 
various  watershed  characteristics.  As  is  often  the 
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case,  the  equations  may  be  fairly  accurate  for  the 
prediction  of  the  mean  and  maybe  the  standard 
deviation,  but  the  prediction  of  the  skew  coefficient 
is  Hkely  to  be  very  poor.  Thus  if  the  mean,  standard 
deviation,  and  skew  coefficients  predicted  from 
these  equations  are  used  to  generate  synthetic 
sequences,  then  the  validity  of  the  sequences  may 
be  subject  to  question.  The  validity  of  the  synthetic 
sequences  could  be  improved  if  the  correct  correla- 
tion between  the  three  parameters  is  preserved  in 
the  generated  data.  A  more  obvious  example  may 
be  the  characteristics  of  a  unit  hydrograph  — the 
volume,  peak  rate,  lag  time,  and  recession  param- 
eters. If  the  right  correlation  is  not  retained,  it 
might  be  impossible  to  define  its  shape. 

If  the  conventional  technique  of  estimating  these 
parameters  from  the  watershed  characteristics  using 
some  form  of  a  stepwise  selection  of  variables  is 
used  to  get  the  prediction  equations,  then  the  corre- 
lation between  the  parameters  is  not  guaranteed. 
It  would  of  course  be  guaranteed  if  all  the  regression 
equations  explained  100  percent  of  the  variance  in 
the  dependent  variable.  Since  this  almost  never 
happens,  there  is  a  good  possibility  that  the  inter- 
correlations  between  parameters  are  not  right. 

It  is  possible  to  calculate  a  series  of  equations  to 
predict  a  group  of  parameters  such  that  their  inter- 
correlations  are  maintained.  Consider  the  results  of 
a  canonical  correlation  between  a  group  of  p  re- 
sponse variables,  dependent  variables,  and  q  causa- 
tive variables  or  independent  variables.  In  this  case, 
assume  that  p  is  less  than  or  equal  to  q.  Output  from 
the  canonical  correlation  may  be  summarized  as 
follows: 

Xi  =  a\iXi  +  anx>  .  .  .  +  aiqXq 

X2  =  a2lXi  +  a-2zX2    .    .    .  +a2qXq 


Xi,  =  a„iXi  +  a,,2X2  .  ■  ■  +ai„,x,, 

6iiyi  + 612X2  .  .  .  +bi,,y,,=  Yi 

621X1 +  622/2  .  .  .  +  b2i.yi,=  Y2 

6,,i  V  i  +  6;,2y2  .  .  .  +  b,,,,y,,=  Y,, 


in  which  the  first  row  represents  a  linear  combina- 
tion of  the  q  inder)endent  variables  Xi  and  p  de- 
pendent variables  y,,  which  produces  the  highest 
p«)ssible  correlation  that  can  be  obtained  between 
these  two  sets  of  variables.  The  second  row  is  the 
highest  possible  correlation  that  can  be  obtained 
between  the  two  sets  after  the  effects  of  the  first 
row  are  eliminated.  There  will  be  p  such  vector  sets. 
If  we  let  ki  be  the  correlation  between  the  ith  set, 
the  following  equation  may  be  written 

AX=kBY  (3) 

in  which  A  and  B  are  coefficient  matrices,  p  by  9 
and  p  by  p,  respectively;  X  and  Y  are  vectors  of  the 
q  independent  and  p  dependent  variables,  respec- 
tively; and  \  is  a  p  by  p  diagonal  matrix  of  correla- 
tions. By  solving  the  equation  for  Y,  it  is  possible  to 
obtain  P  linear  equations  for  estimating  the  response 
variables 

{kB)-'AX^Y.  (4) 

It  may  be  shown  that  the  intercorrelations 
between  the  response  variables  are  retained  by 
this  approach.  It  may  also  be  shown  that  the  regres- 
sion equations  are  identical  to  multiple  regressions 
on  the  q  causative  variables.  If  this  is  the  case, 
then  it  is  possible  to  retain  the  intercorrelation  in 
the  response  Vciriables  if  the  same  causative  vari- 
ables are  used  in  all  regression  equations. 

The  problem  again  arises  as  to  how  to  select  the 
variables  for  these  regression  equations.  Canonical 
analysis  appears  to  be  a  good  possibility.  In  a  canon- 
ical correlation,  vector  weights  applied  to  both  the 
response  and  causative  variables  are  calculated  to 
maximize  the  correlation  between  the  two  sets.  In 
output  from  most  canonical  analysis  computer  pro- 
grams, all  variables  are  analyzed  in  standardized 
format;  therefore,  the  variable  coefficients  are  an 
index  of  the  contribution  of  a  specific  variable  to  the 
correlation.  It  is  also  possible  to  test  the  significance 
of  each  of  the  sets  of  correlation  coefficients  using 
the  X"  statistic.  The  degrees  of  freedom  and  \-  are 
part  of  the  output  of  most  canonical  anaKsis  com- 
puter programs. 

Examination  of  the  coefficients  of  the  causati\e  or 
independent  variables  in  the  significant  sets  will 
be  an  indication  of  the  variables  that  should  be  used 
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Figure  7.  — Distribution  of  discriminant  scores  for  two  regions  of  Oklahoma  (at  convergence). 


in  the  canonical  regressions.  Now  let  us  suppose 
that  in  the  canonical  regression  r  less  than  p  sets 
of  correlations  are  significant.  We  would  therefore 
have  fewer  sets  of  equations  than  we  have  variables, 
and  the  solution  for  the  set  of  independent  variables 
would  be  indeterminant.  It  is  still  possible  to  get  an 
approximate  solution  by  selecting  the  p  —  r  most 
independent  response  variables  and  regressing 
these  on  the  q  causative  variables.  These  regression 
equations  in  standardized  form  may  be  substituted 
into  the  significant  sets  of  correlations,  equation  3. 
Simplifying,  one  obtains  r  equations  in  r  unknowns. 
If  a  simple  multiple  regression  is  used  to  regress 
the  p  —  r  dependent  variables  on  the  q  independent 
variables,  then  substitution  into  equation  3  will 
yield  results  equivalent  to  using  all  possible  sets  of 
correlations  rather  than  only  the  significant  ones. 
It  would  therefore  be  better  to  use  a  components 
regression  similar  to  the  one  used  to  get  equations 
8  through  10  in  table  2. 


The  canonical  regression  described  in  these  para- 
graphs may  be  illustrated  by  again  considering  the 
data  set  used  in  the  previous  discussions.  For  illus- 
trative purposes  only,  suppose  that  we  would  like  to 
generate  stochastically  the  mean  annual  flow  and 
peak  rate  at  an  unmeasured  stream.  We  would  like 
to  estimate  from  watershed  characteristics  the  mean 
annual  flow,  its  standard  deviation,  and  the  maximum 
annual  flow  rate.  Since  the  maximum  annual  flow 
has  a  recurrence  interval  of  about  once  every  2.3 
years,  the  peak  rate  at  a  2-year  frequency  of  occur- 
rence will  be  substituted  for  maximum  annual  flow. 
The  objective  is  therefore  to  calculate  three  regres- 
sion equations  for  mean  annual  flow,  standard 
deviation  of  mean  annual  flow,  and  2-year  peak  rate 
from  watershed  characteristics  such  that  the  corre- 
lation between  these  three  variables  is  retained. 
An  example  of  such  a  correlation  matrix  is  shown  in 
table  7. 

A  canonical  analysis  between  these  three  vari- 
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Table  7.  —  Correlation  matrix  of  mean  annual  flow, 
standard  deviation  of  mean  annual  flow,  and  peak 
flow  rate  for  a  2-year  recurrence  interval 


Variable 

Qa 

s„ 

Oa  

LOO 
.990 
.897 

0.990 
LOO 
.853 

0.897 
.853 
LOO 

S„  

P2  

Note:  These  data  are  based  on  an  analysis  of  47  watersheds  in 
eastern  Oklahoma. 


ables  and  the  12  watershed  characteristics  is 
presented  in  table  8.  This  analysis  shows  that  there 
are  two  roots  or  vector  combinations  that  are  signifi- 
cant (see  the  values).  The  table  also  shows  that 
the  prominent  watershed  characteristics  in  these 
vectors  are  drainage  area,  length  of  watershed,  and 
average  basin  precipitation  (underlined  values).  A 
canonical    correlation    between    the  hydrologic 


characteristics  for  a  2-year  return  period  and  the 
watershed  characteristics  of  drainage  area,  water- 
shed length,  and  average  basin  rainfaO  are  pre- 
sented in  table  9.  Again,  as  in  the  previous  table, 
only  two  roots  are  significant.  Therefore,  an  inde- 
terminant  condition  exists  as  there  are  only  two 
equations  in  three  unknowns.  Thus  one  of  the  vari- 
ables must  be  regressed  on  the  three  watershed 
characteristics.  Table  7  shows  that  the  peak  rate 
of  runoff  is  the  most  independent  hydrologic 
characteristic.  A  components  regression  of  this 
variable  on  the  three  watershed  characteristics 
gives  the  following  equation.  Only  two  roots  were 
significant  in  this  regression. 
Standardized  form: 

Pz  -  0.417  A  +  0.423  L  +  0.444  P  (5a) 
Raw  data  form: 
P2  =  -  28,565  +  2.82  A  +  70.0  L  +  884  P  (5b) 
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Substitution  of  equation  5a  for  P2  in  the  first  two 
vectors  of  the  canonical  correlation  and  setting  the 
vectors  up  in  the  form  of  equation  3  yields  the 
following  simultaneous  equation  (in  standardized 
form): 


0.976 


(A-d) 


0.412 


11  +  0.124 


=  -  0.0540 


a-Qa 


0.997 


cr. 


(Sr,  —  5a) 


(6a) 


1 


.  0. 155  +  0. 1 14  +  0. 0590 


or,, 


=  0.688 


iQa-q„) 


(To 


0.724  ^^"    '"^  (6b) 


Solution  of  these  equations  yields  the  following 
equations  for  the  regression  of  mean  annual  flow 
and  standard  deviation  of  mean  annual  flow: 
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Figure  10.  —  Log  comparison  of  general  and  regional  prediction  equations. 


.  0.854  +  0. 130  +  0.230 

0"«i  C7i  (Tf, 

(7a) 

or 

=  - 1.617  +  0.521  .4  +  1.94 /.  +  41.3 (7b) 


(5„  —  s„) 

=  1.03  -  0.034  '-!^  +  0,137  *^ 

(Ta  0"i  (Tp 

(To) 

or 

5„  =  -  599  +  0.407  A  -  0.332  /.  +  16.0  /\  (7d) 
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Table  S.  — Canonical  correlation  between  three  hydrologic  variables  — mean 
annual  flow,  standard  deviation  of  mean  annual  flow,  and  peak  rate  of  flow— for 
a  2-year  occurrence  interval  and  12  watershed  characteristics  of  47  watersheds  in 
eastern  Oklahoma 

[Underlined  values  indicate  prominent  watershed  characteristics.] 


Watershed  characteristics' 

Hydrologic  characteristics 

First  root:  R  =  0.993;      =  0.986;      =  236.2;  df.  =  36;         =  51.0 

1.  0.977                  5.    -0.011                    9.  0.027 

2.  -.011                  6.     -.011                   10.  .024 

3.  -.079                  7.       .059                  11.  .019 

4.  .010                  8.       .031                   12.  .009 

1.  -0.028 

2.  .996 

3.  -.090 

Second  root:  R  =  0.893;  R^  =  0.797;      =  73.0;  df.  =  22;  X^   =  33.9 

'                           '                        '     •'                '  .05 

1.  -0.150                  5.      0.015                    9.  -0.078 

2.  -.009                  6.       .028                  10.  .065 

3.  .156                  7.       .265                  11.  -.002 

4.  -.004                  8.     -.074                  12.  .000 

1.  0.676 

2.  -.733 

3.  .082 

Third  root:  R  =  0.527;  R^  =  0.278;  X^  =  12.4;  df.  =  10;  X^   =  18.3 

'                           '                        '     •'                '  .05 

1.  0.126                  5.      0.021                    9.  0.066 

2.  -.068                  6.       .038                  10.  -.082 

3.  -.180                  7.     -.031                   11.  -.023 

4.  -.003                  8.     -.090                  12.  -.005 

1.  -0.781 

2.  .580 

3.  .230 

'  Numbers  1  to  12  correspond  to  the  12  watershed  characteristics  given  in  table  1. 


A  summary  of  the  three  equations  showing  their 
correlation  coefficients  and  standard  errors  is  shown 
in  table  10.  These  equations  should  retain  most  of 
the  correlation  between  the  three  hydrologic 
variables  and,  as  shown  by  their  standard  errors, 
have  a  good  fit  to  the  data. 

If  all  correlation  vectors  in  the  canonical  analysis 
are  significant,  then  ordinary  regression  equations 
may  be  made  between  the  dependent  and  independ- 
ent variables.  Under  these  circumstances,  the  usual 
supporting  statistical  data  such  as  standard  errors 
of  the  coefficients  and  limits  on  prediction  can  be 
obtained.  If,  however,  fewer  vectors  are  used,  then 
the  means  of  calculating  the  supporting  data  are  not 
yet  developed. 

Conclusions 

A  few  examples  have  been  presented  that 
illustrate  some  of  the  many  applications  of  multi- 


variate statistics  in  the  field  of  hydrology.  Factor 
and  components  analysis,  canonical  correlation, 
and  discriminant  analysis  have  been  illustrated. 
The  possibility  of  developing  component  and  canoni- 
cal regression  equations  and  of  using  discriminant 
analysis  as  the  major  tool  in  a  cluster  analysis  for 
isolating  regions  of  similar  hydrologic  characteristics 
have  been  illustrated.  Data  consisting  of  37  hydro- 
logic  or  response  variables  and  12  watershed  or 
causative  variables  for  90  runoff  measuring  stations 
in  Oklahoma  were  used  to  illustrate  the  analyses. 

In  the  discussion  on  regression,  11  different 
regression-type  equations  for  predicting  mean 
annual  flow  from  watershed  characteristics  were 
described.  Advantages  and  disadvantages  of  the 
various  equations  were  discussed.  The  common 
multiple  regression  and  stepwise  regression  tech- 
niques produce  equations  such  that  standard  errors 
of  their  coefficients  and  confidence  limits  may  be 
determined.  Equations  based  on  factor  analysis 
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and  components  regression  may  have  slightly  more 
luteal  coefficients,  but  standard  errors  of  the 
coefficients  and  confidence  limits  are  impossible  to 
define  at  least  under  present  evaluation  techniques. 
In  general,  most  equations  produced  about  the  same 
standard  error  of  estimate,  although  quite  a  large 
variety  of  independent  variables  were  selected.  One 
equation,  however,  shows  promise  of  being  worthy 
of  further  consideration.  In  this  equation,  the  re- 
sponse variable  was  adjusted  by  dividing  by  the  two 
most  prominent  watershed  variables  before  being 
regressed   on    the   watershed   characteristics.  It 

Table  9.  — Canonical  correlation  between  mean 
annual  flow,  standard  deviation  of  mean  annual 
flow,  and  peak  rate  of  flow  for  a  2-year  recurrence 
interval;  and  drainage  area,  watershed  length, 
and  average  basin  rainfall  for  47  watersheds  in 
eastern  Oklahoma 


Watershed  characteristics 

Hydrologic  characteristics 

First  root:  R  =  0.992;      =  0.984;     =  222.1; 

'^f-  =     X^„,  =  16-9 

I.  0.959 

1.  -0.054 

2.  -.067 

2.  .997 

3.  .099 

3.  -.059 

Second  root:  R  =  0.810;  R^  =  0.6,S7;      =  46.2; 

(//.  =  4;  y2   =  9.49 

1.  -0.163 

1.  0.688 

2.  .170 

2.  -.724 

3.  .103 

3.  .055 

Third  root:  R  =  0.131;  ft^  =  0.017;      =  0.74; 

'^/■  =  1:X^,  =  3.84 

1.  -0.222 

1.  0.784 

2.  .237 

2.  -.572 

3.  -.041 

3.  -.242 

resulted  in  an  equation  with  a  standard  error  of 
about  one-third  of  the  other  equations. 

The  second  part  of  the  paper  presented  a  discus- 
sion of  regionalization  and  the  need  for  defining 
regions  of  similar  hydrologic  response.  The  hydro- 
logic  variables  were  separated  into  similar  groups 
using  a  factor  analysis.  Five  distinct  groups  were 
found:  (1)  The  means  and  standard  deviations  of  the 
annual  and  monthly  How  rates  for  the  winter-spring 
period,  (2)  the  means  and  standard  deviations  of  the 
monthly  flow  rates  for  the  summer-fall  period,  (3) 
the  peak  rates  of  flow  at  various  frequencies  of 
occurrence,  (4)  the  maximum  7-day  flow  volumes 
for  various  frequencies  of  occurrence,  and  (5)  the 
minimum  7-day  flow  volume  for  various  frequencies 
of  occurrence.  To  illustrate  the  use  of  discriminant 
scores  as  an  aid  in  regionalization,  the  mean  summer 
flow  rates  and  standard  deviations  were  used  as 
discriminants  in  a  discriminant  analysis.  After  an 
initial  assignment  of  data  from  the  watersheds 
into  two  groups,  the  discriminant  scores  were  used 
along  with  an  algorithm  to  reassign  the  misclassified 
data  points.  Upon  convergence  of  the  scheme,  good 
group  discrimination  was  obtained.  The  groups  had 
good  geographical  distribution  for  numbers  1,2.  and 
3  above,  generally  dividing  the  State  in  a  north-south 
direction.  Group  separation  for  peak  rates  did  not 
show  unique  geographical  discrimination:  however, 
a  discriminant  analysis  using  the  12  watershed 
characteristics  showed  that  the  amount  and  intens- 
ity of  rainfall  and  watershed  slope  would  be  good 
indicators  of  the  group  to  which  the  watershed 
should  be  assigned.  Another  example  in  whicii  the 
discriminators  were  the  means  and  standard  devia- 
tions of  the  annual  flow  and  the  peak  rate  of  flow  for 
a  2-year  recurrence  interval  showed  how  the  cluster 
analysis  could  be  used  to  divide  the  State  into  more 
than  two  regions.  Predictive  equations  were  also 
calculated  to  show  the  advantage  of  regionalization 
in  the  development  of  response  models. 


Table  10.— Canonical  regression  of  Qa-  ^a-  P2  I  -  ""<^  '*  ^"'■'><''/  u  <iter- 

sheds  in  eastern  Oklahoma 


Equation 

A 

/. 

P 

Constant 



SEE 



Qa  

0.521 

1.94 

41.3 

-  1617 

0.974 

190 

s„  

.407 

-  .332 

16.0 

-599 

.983 

99 

Pa  

2.82 

70.0 

884 

-28565 

.827 

5452 
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The  last  part  of  the  paper  was  concerned  with 
developing  a  technique  to  calculate  the  regression 
equations  for  a  group  of  response  variables  such  that 
their  intercorrelation  would  be  retained  in  prediction. 
The  problem  was  also  one  of  selecting  the  independ- 
ent variables  for  use  in  these  equations.  Canonical 
correlation  was  shown  to  be  an  acceptable  way 
of  finding  the  watershed  variables  associated  with 
the  response  variables.  In  a  canonical  correlation, 
Unear  combinations  of  the  watershed  variables 
are  correlated  with  linear  combinations  of  the 
response  variables.  The  mean  annual  flow,  its  stand- 
ard deviation,  and  peak  rate  for  a  2-year  recurrence 
interval  were  correlated  with  the  12  watershed 
variables.  In  general,  two  sets  of  independent  cor- 
relations between  these  variables  were  found  to  be 
significant.  Coefficients  of  the  12  watershed  charac- 
teristics showed  drainage  area,  watershed  length, 
and  precipitation  to  be  the  most  significant  variables. 
It  was  shown  that  the  vectors  of  a  canonical  cor- 
relation could  be  used  to  get  a  series  of  regression 
equations  for  the  response  variables.  If  there  are  an 
equal  number  of  response  variables  and  significant 
vectors,  then  the  regression  equations  for  the  re- 
sponse variables  are  identical  to  multiple  regression 
equations  on  the  same  set  of  watershed  character- 
istics. If  there  are  fewer  significant  vectors  than 
variables,  the  most  independent  variables  are  regres- 
sed on  the  dependent  variables  and  the  resulting 
regression  equation,  in  standardized  form,  is  substi- 
tuted into  the  significant  vectors.  In  this  case,  only 
one  independent  variable,  peak  rate  of  flow,  was 
regressed  on  three  dependent  variables  and  substi- 
tuted into  the  two  remaining  vectors.  Solving  the  two 
simultaneous  equations  of  the  canonical  correlation 
then  resulted  in  the  regression  equations  for  the 
other  two  response  variables. 

These  discussions  not  only  illustrate  a  few  of  the 
many  uses  of  multivariate  analysis  but  also  present 
some  concepts  that  may  be  worth  considering  in  the 


development  of  hydrologic  predictive  equations  for 
various  regions  of  the  country. 
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STATISTICAL  TOLERANCE  LIMITS  FOR  A 
PEARSON  TYPE  III  DISTRIBUTION 

By  J .  R.  Wallace  and  D.  G.  Fontanel 


Abstract 

Flood  frequency  analysis  is  a  statistical  prediction 

"  of  future  events.  From  a  sample  of  flood  data, 
specified  recurrence  interval  floods  can  be  esti- 

,  mated.  Because  these  specified  recurrence  interval 
floods  are  determined  from  a  sample  of  all  possible 

'  floods,  they  can  be  expected  to  vary  in  the  future. 

>  Statistical  tolerance  limits  provide  a  means  of 
estimating  the  range  of  future  variation  in  a  specified 
recurrence  interval  flood. 

I  Methods  for  determining  statistical  tolerance 
limits  for  the  Pearson  Type  III  distribution,  the 
distribution  which  forms  the  basis  for  flood  fre- 
quency analyses  performed  by  Federal  agencies, 
have  not  been  available.  Development  of  tolerance 
limits  depends  upon  the  determination  of  tolerance 
factors.  A  theoretical  approach  to  the  development 
of  tolerance  factors  for  the  Pearson  Type  III  dis- 
tribution was  attempted  but  was  not  successful.  A 
numerical  technique  was  then  tried.  By  simulation 
on  a  digital  computer,  samples  from  a  Pearson 
Type  III  distribution  were  generated.  A  set  of  em- 

j  perical  tolerance  factors  was  developed  directly 

i  from  the  generated  data.  These  tolerance  factors 
exhibit  several  expected  characteristics:  the  factors 
increase  with  skew,  probability  (E),  and  population 
proportion,  and  decrease  with  sample  size. 

Introduction 

Figure  1  shows  a  series  of  points  which  rep)resent 
the  largest  flow  values  for  each  year  for  30  years  on  a 
particular  river.  These  points  have  been  plotted  on 
normal  probability  paper,  and  a  straight  line  has 
been  fitted  to  the  data.  The  line  is  called  a  flood 
frequency  curve.  Lines  similar  tt)  the  one  in  figure 
1  are  often  used  to  determine  such  quantities  as 
the  average  annual  damage  to  be  expected  from 
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flooding  or  to  determine  the  required  size  of  hy- 
draulic structures. 

Flood  flows  are  usually  classified  by  return  period. 
For  example,  the  20-year  flood  from  figure  1  is 
approximately  5,600  cubic  feet  per  second  (c.f.s.). 
This  last  statement  implies  that,  on  the  average,  a 
flow  of  5,600  c.f.s.  will  be  equaled  or  exceeded  one 
time  in  20  years.  It  is  clear  that  this  statement  is  not 
precise  because  it  is  based  on  a  sample.  Only  in  the 
event  that  the  line  exactly  represents  the  population 
does  the  statement  become  precise. 

A  more  precise  statement  can  be  made  when  only 
the  sample  is  known.  Again  referring  to  figure  1,  it 
can  be  stated  that  the  probabiHty  of  the  true  20-year 
flood  being  equal  to  or  less  than  6,000  c.f.s.  is  0.90. 
the  probability  of  it  being  equal  to  or  less  than  6.200 
c.f.s.  is  0.95.  and  the  probability  of  it  being  equal  to 
or  less  than  6,500  c.f.s.  is  0.99.  This  implies  that  the 
magnitude  of  the  true  20-year  flood  cannot  be  deter- 
mined from  a  historical  record.  However,  when  the 
flood  population  is  normaUy  distributed,  statistical 
tolerence  limits  can  be  evaluated  from  the  historical 
record.  The  statistical  tolerance  limits  define  a 
range  within  which  the  true  flood  magnitudes  of 
various  return  periods  can  be  expected  to  lie.  Toler- 
ance limits  for  the  example  problem  are  shown 
graphically  on  figure  1  and  provide  the  basis  for 
the  probabilistic  ranges  of  the  20-year  flood  stated 
above. 


Statistical  Tolerance  Limits 

Statistical  tolerance  limits  may  be  two  sided  or 
one  sided.  One-sided  limits  may  be  either  upper  or 
lower  limits.  The  following  discussion  will  be  directed 
toward  tlie  use  and  calculation  of  one-sided  upper 
tolerance  limits. 

Tolerance  limits  are  computed  from  tolerance 
factors  by  an  equation  of  the  form: 

/„  -  .f  +  Ks  (1) 
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where: 

X  =  sample  mean 
5  =  sample  standard  deviation 
K  =  tolerance  factor 
lu  —  tolerance  limit 

A  one-sided  upper  tolerance  limit  is  represented 
in  figure  2.  The  unknown  population  distribution  is 
represented  by  the  solid  Hne  and  the  distribution  of 
the  sample  is  shown  as  a  dashed  line.  A  proportion  P 
of  the  population  is  less  than  the  value  L.  (In  terms 
of  the  flood  frequency  example,  L  might  represent 
the  true  20-year  flood,  and  P  would  then  be  0.95.)  It 
is  clear  from  equation  1  that  lu  is  a  random  variable 
because  it  is  a  combination  of  two  random  variables, 
the  sample  mean  and  the  sample  standard  deviation. 

Application  of  tolerance  Hmits  requires  that  a 
value  be  found  for  the  tolerance  factor  K  such  that 
the  upper  tolerance  limit,  /«,  computed  from  equa- 
tion 1  will  have  a  value  such  that  the  probability 


that  at  least  a  proportion,  P,  of  the  population  is 
less  than  /«  is  equal  to  some  specified  value  E.  A 
one-sided  upper  tolerance  limit  can  thus  be  defined 
as  that  limit,  lu,  such  that  with  probability  E  the 
limit  lu  is  greater  than  or  equal  to  the  population 
limit,  i-,  as  follows: 

P{lu^L)=E.  (2) 

The  Noncentral  t  Statistic 

Evaluation  of  a  tolerance  limit  depends  upon  the 
evaluation  of  the  tolerance  factor.  From  equation  1 
it  can  be  noted  that  x  and  s  are  computed  from  the 
sample,  and  the  constant  K  is  the  only  remaining 
term  required.  Because  of  this,  the  constant  K 
must  be  a  function  of  the  degree  of  probability 
statement,  E;  the  population  limit,  L,  which  is  a 
measure  of  the  required  proportion;  and  the  sample 
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size  used  to  compute  x  and  a".  In  their  development 
of  the  tolerance  factors  for  a  one-sided  tolerance 
limit,  Johnson  and  Welch  (6)  made  use  of  the  non- 
central  "f"  statistic.  The  noncentral  "f"  statistic  is 
defined  by  the  following  equation: 


2+8 


w 


1/2 


(3) 


An  expression  for  the  population  limit  A  can  be 
written  in  a  form  similar  to  equation  1: 


L  —  x-'r  kui 


(4) 


There  is  a  particular  value  of  k,,  that  will  satisfy 
equation  4  for  any  given  sample.  Rewriting  equa- 
tion 4: 


where  z  is  a  quantity  distributed  normally 
about  zero  with  unit  standard  deviation. 
If  is  a  quantity  distributed  independently 
as  X'lf,  where  X'-  represents  the  chi- 
square  distribution  and  /  is  the  number 
of  degrees  of  freedom 
and  8  is  a  constant. 


ku 


L  —  X 


(5) 


It  should  be  noted  that  x,  s,  and  k„  are  random 
variables  and  A  is  a  constant  related  to  the  propor- 
tion, P.  Multiplying  equation  5  by  n'  -  (n  is  the 
sample  size)  yields 


This  statistic  is  distributed  in  a  manner  that 
depends  only  on  8  and  /. 


n'''ku  =  n 


(L-x) 
s 


(6) 


Figure  2.— A  one-sided  upper  toleraiu-e  limit. 
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Further  manipulation  of  equation  6  gives  From  equations  1  and  4  it  can  be  seen  that  if: 


(7) 

where  jli  is  the  population  mean 

and  o"  is  the  population  standard  deviation 

Equation  7  is  equivalent  to  equation  6.  For  any 
distribution,  the  population  limit,  L,  mean,  fx  and 
standard  deviation,  cr,  are  constants.  Therefore. 

  (L  —  ijl)  is  a  constant  term  and  is  equivalent  to 

a 

8  in  equation  3.  The  sample  means  from  any  dis- 
tribution are  approximately  normally  distributed 
with  a  mean  equal  to  the  population  mean,  /x,  and 
a  standard  deviation  equal  to  the  population  stand- 
ard deviation  divided  by  the  square  root  of  the 
sample  size,  cr/n"-  (2,  p.  151).  Therefore,  the  term 

1/2 

(x  —  fx)  will  be  normally  distributed  with  a  mean 
equal  to  zero  and  a  standard  deviation  equal  to 
one.  This  is  equivalent  to  z  in  equation  3.  For  a 
normal  distribution,  a  function  of  the  ratio  of  sample 
to  population  variance  is  chi-square  (X^)  distrib- 
uted. More  precisely,  (n  — l)5^/c7^  is  distributed  as 
with  (n  —  l)  degrees  of  freedom  (4,  p.  144). 
Therefore,  for  the  normal  distribution,  s'^/cr^  is  dis- 
tributed as  X^lf,  which  is  equivalent  to  iv  in  equa- 
tion 3.  For  the  normal  distribution  then,  equation 
7  is  equivalent  to  equation  3  and  n^l'^k,,  has  a  non- 
central  "t"  distribution 


then. 

Therefore,  if: 

P{lu  ^  L)=E 

then, 

P{K^k„)^E 

or 

P{ku^K)=E. 

Thus,  the  value  of  the  tolerance  factor  K  required 
"o  satisfy  the  definition  of  the  upper  tolerance 
limit  can  be  determined  from  a  knowledge  of  the 
noncentral  t  distribution.  Tolerance  factors  of  this 
type  have  been  tabulated  (7). 

Pearson  Type  III  Distribution 

The  normal  distribution  has  been  applied  to 
flood  frequency  analysis  on  many  occasions.  In  those 
situations  where  the  normal  distribution  is  accept- 
able, tolerance  limits  can  be  determined  from  the 
noncentral  t  distribution.  However,  most  Federal 
agencies  have  adopted  the  proposal  of  the  Water 
Resources  Council  {1)  that  the  log-Pearson  Type  III 
distribution  be  used  as  the  base  method  for  flood 
frequency  analysis. 

The  Pearson  Type  III  distribution,  developed 
by  Karl  Pearson,  has  the  form: 

y=yo  {l  +  xla)P  e-P""'"       -a^x^^  (10) 


n'l^k,  =  t{n-l,n'l''U,E)  (8) 

where  (n  —  l)  is  the  number  of  degrees  of 
freedom 

n^l'^U  is  a  measure  of  the  required 
proportion  or  limit  as  U=  {L  —  ix)/(7 

and  E   is   the   probabiHty   or  con- 
fidence desired. 
Given  a  particular  sample  size,  proportion  or  limit, 
and  probability,  the  corresponding  tolerance  factor 
can  be  computed  from 

tin-l,n'lW,E) 

  (9) 


where  p  is  the  skewness  parameter,  a  is  the  lower 
bound,  and  yo  is  the  value  of  the  curve  at  the  mode 
(;c  =  0).  The  term  "log-Pearson"  arises  from  the 
logarithmic  transformation  of  flood  data  that  is 
required  to  make  the  data  fit  the  Pearson  Type  III 
distribution. 

The  form  of  the  Pearson  Type  III  distribution 
precludes  a  theoretical  treatment  along  the  lines 
employed  for  the  normal  distribution.  The  terms  z 
and  8  in  equation  3  did  not  depend  on  the  population 
distribution.  However,  use  of  the  w  term  was  based 
on  a  normal  population  for  which  the  sample  vari- 
ance is  known  to  be  distributed  as  X^.  A  similar 
statement  cannot  be  made  for  the  sample  variance 
from  a  Pearson  Type  III  distribution. 
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Experimentally  Determined 
Tolerance  Factors 

The  first  phase  of  the  c  urrent  work  was  directed 
toward  a  theoretical  development  of  tolerance 
factors  for  a  Pearson  Type  III  distribution.  This 
phase  of  the  research  was  not  successful;  therefore, 
a  different  approach  was  taken.  Simulation  tech- 
niques employing  a  digital  computer  were  used  to 
generate  samples  from  Pearson  Type  III  distribu- 
tions. These  samples  provided  the  empirical  data 
required  to  establish  the  distribution  of  various 
statistics  of  the  Pearson  Type  III  distribution. 
Finally,  the  data  were  used  to  experimentally 
determine  statistical  tolerance  factors. 

In  the  experimental  phase  of  the  research,  the 
distribution  of  the  tolerance  factor: 


was  determined  empirically.  The  statistics  were 
computed  from  samples  developed  by  simulation 
techniques  on  a  digital  computer.  Rather  than  select 
samples  directly  from  a  Pearson  Type  III  distribu- 
tion, the  Pearson  Type  III  distribution  was  trans- 
formed to  an  equivalent  Gamma  distribution.  The 
transformation  is  given  in  the  Appendix.  The 
Gamma  distribution  has  the  form: 

TU)  0^x<^  (11) 

where  a  is  a  scale  parameter.  A:  is  a  skewness  param- 
eter, and  I  {k)  is  the  gamma  function.  The  mean  of 
this  distribution  is  kja,  the  variance  is  equal  to 
kja-  and  the  skew  is  equal  to  1/^'  -.  The  technique 
used  to  generate  samples  from  Gamma  distribution 
is  described  by  the  equation  (8): 

x  =  -^(lognr,)  (12) 

where  x  has  a  (iamma  distribution  with  i)arameters 
a  and  A.  and  r,  is  a  random  number  uniform  on  the 
interval  (0,  1).  For  convenience,  the  scale  parameter, 
a,  was  set  equal  to  unity.  (The  one-sided  tolerance 
factor  is  independent  of  «). 

Values  of  r/  were  generated  on  a  UNIVAC  1108 


by  means  of  a  standard  subroutine  available  on  the 
computer.  The  randomness  and  uniformity  of  the 
values  of  r,  were  thoroughly  tested  before  these 
values  were  used  for  further  computation.  When 
a  set  of  acceptable  values  of  r,  was  generated,  this 
set  was  used  in  equation  12  to  generate  999  samples 
of  specified  size  and  skew.  Sample  sizes  of  30,  40. 
50,  60,  80,  and  100  were  used,  for  each  sample 
size,  skew  values  of  0.20  (k  =  25),  0.25  (k  =  16), 
0.50  (A  =  4),  0.707  (^  =  2),  and  1.0  (/t  =  1)  were 
selected  for  study.  (At  a  skew  value  of  zero  the 
Pearson  Type  III  distribution  is  not  defined;  for  a 
skew  value  of  one  or  larger  the  distribution  becomes 
J-shaped  and  is  meaningless  for  flow  frequency 
analysis.)  The  samples  were  tested  to  determine 
the  goodness-of-fit  with  the  particular  Pearson  Type 
III  population  from  which  it  was  drawn. 

For  a  given  skew  and  sample  size,  each  of  the  999 
samples  was  used  in  equation  5  to  compute  a  value 
of  the  tolerance  factor  ku-  The  value  of  the  popula- 
tion limit,  L,  was  computed  from  tables  of  percent- 
<age  points  for  the  Pearson  III  (5).  Upper  population 
limits,  L,  corresponding  to  90.  95,  99,  and  99.9  per- 
cent were  utilized  in  the  computations,  and  for  each 
value  of  L,  tolerance  factors  corresponding  to  value 
of  E  of  0.90,  0.95,  0.99,  and  0.999  were  determined. 
The  999  values  of  k,,  were  then  ranked  in  ascending 
order.  From  the  ranked  values  of  k,,  a  value  K'  was 
determined  which  was  larger  than  E  percent  of  the 
other  values  of  k,,.  An  empirical  tolerance  limit  was 
defined  by: 

=x  +  K's.  (13) 

Therefore,  for  E  percent  of  the  samples 
K'  ^  ku 

and 

Thus,  the  approximate  empirical  probability 
statement 

P{i:^L)=E  (14) 

represents  the  experimentally  determined  tolerance 
limits. 

Since  each  set  of  tt>lerance  factors  was  based  on 
a  set  of  999  samples  (for  a  given  sample  size  and 
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skew)  and  since  these  factors  were  intended  to 
serve  as  estimates  of  the  population  values,  the 
data  (computed  tolerance  factors)  were  "smoothed" 
by  a  three-part  process.  This  smoothing  of  data 
was  done  because  it  was  reasoned  that  the  popula- 
tion tolerance  factors  can  be  represented  by  smooth, 
continuous  functions. 

The  smoothing  of  the  data  was  accomplished  by 
visual  curve  fitting,  which  allowed  assumed  trends 
in  the  data  as  well  as  continuity  to  be  maintained. 
All  curve  fitting  was  done  by  hand  to  allow  for  the 
weighting  of  the  data,  that  is,  the  tolerance  factors 
for  E  =  0.90  are  more  reliable  than  the  factors  based 
on  E  =  0.999.  In  a  simulation  process  like  the  one 
used  for  this  work,  the  tails  of  a  distribution  are 
the  hardest  part  of  the  distribution  to  define. 
Another  reason  for  manually  fitting  curves  was  to 
maintain  assumed  trends  in  the  tolerance  factors 
in  regard  to  continuity. 

The  purpose  of  the  first  part  of  the  smoothing 
process  was  to  smooth  the  tolerance  factors  over 
the  range  of  probability,  E.  For  a  given  value  of 
skew  and  sample  size,  the  tolerance  factors  for 
each  limit,  L,  were  plotted  on  normal  probability 
paper  as  a  function  of  the  E  values.  Smooth  curves 
were  then  fit  to  the  data  points  (see  fig.  3). 

The  purpose  of  the  second  part  of  the  smoothing 
process  was  to  smooth  the  tolerance  factors  over 
the  range  of  the  skew.  For  a  given  value  of  the 
population  limit  and  sample  size,  the  tolerance 
factors  for  each  probability,  E,  were  read  from  the 
curves  developed  in  the  first  part  of  the  smoothing 
process  (see  fig.  3).  These  adjusted  tolerance 
factors  were  then  plotted  versus  the  values  of  skew 
(see  fig.  4).  At  the  value  of  skew  equal  to  zero,  the 
population  distribution  is  normal  and  the  normal 
tolerance  factors  were  used.  Smooth  curves  were 
fitted  to  the  data. 

The  final  step  in  the  smoothing  process  was 
designed  to  smooth  the  data  over  the  range  of  the 
sample  size.  For  a  given  value  of  the  population 
limit  and  skew,  values  of  the  tolerance  factors  for 
each  value  of  £'  were  read  from  the  curves  developed 
in  the  second  part  of  the  smoothing  process  (see 
fig.  4).  These  tolerance  factors  were  then  plotted 
versus  the  reciprocal  of  the  sample  size.  At  a  sample 
size  of  infinity,  the  population  is  completely  defined; 
therefore,  the  tolerance  factors  become  the  popula- 
tion deviates.  At  a  value  of  the  reciprocal  of  sample 


Sample  Size  =  80 


•999  .99  .95  .90  .50 

Probability,  E 

Figure  3. —  Hypothetical  sample  of  the  first-part  smoothing 
process. 


Sample  Size 


Figure  4. —  Hypothetical  sample  of  the  second-part  smoothing 
process. 
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Pop.  Limit  =  95% 
Skew  =0.80 


Probability,  E 


.01  .02  .03  -04 

Reciprocal  of  Sample  Size 


Figure  5. —  Hypothetical  sample  of  the  third-part  smoothing 
process. 

size  equal  to  zero  (sample  size  equal  to  infinity), 
the  Pearson  deviates  obtained  from  Harter's  table 
(5)  were  plotted.  These  values  served  as  a  lower 
bound  for  the  curves  that  were  fitted  to  the  data 
(see  fig.  5). 


The  values  of  the  tolerance  factors  were  read  from 
the  curves  developed  in  the  final  part  of  the  smooth- 
ing process  (see  fig.  5)  and  put  into  tabular  form 
(see  table  1). 


Conclusions 

The  one-sided  upper  tolerance  factors  shown  in 
table  1  exhibit  several  expected  characteristics. 
For  example,  the  factors  increase  with  skew, 
probabiHty  (£"),  and  population  proportion,  and 
decrease  with  an  increase  in  sample  size.  These 
facts  lead  to  the  conclusion  that  estimates  of 
population  proportions  are  less  reliable,  that  is, 
sample  values  are  expected  to  vary  over  a  greater 
range,  when  one  is  dealing  with  a  Pearson  Type  III 
population  as  opposed  to  a  normal  population. 

The  tolerance  factors  in  table  1  were  developed  to 
provide  estimates  of  the  true  tolerance  factors  for 
Pearson  Type  III  distributions.  The  factors  are  not 
directly  applicable  to  problems  such  as  that  outhned 
in  figure  1  because  their  use  requires  the  sample  be 
from  a  Pearson  Type  III  distribution  with  known 
skew.  The  skew  can  only  be  estimated  from  a 
sample,  and  it  will  be  necessary  to  use  this  esti- 
mated parameter  in  any  procedure  for  computing 
tolerance  limits.  Hence,  tolerance  factors  based 
on  estimated  skews  are  expected  to  be  larger  than 
those  presented  in  table  1. 


Table  \.  — Empirical  tolerance  factors  for  a  Pearson  Type  III  distribution 


Skew  =  0.20 


Skew  =  0.20 


Probability.  £  =  0.90 

Pro 

bability.£  =  0.95 

Sample 

Size 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

1.79 

2.27 

3.34 

4.68 

1.93 

2.45 

3.57 

4.98 

40  

1.72 

2.20 

3.24 

4.,S4 

1.84 

2.34 

3.42 

4.79 

50  

1.67 

2.15 

3.17 

4.45 

1.78 

2.27 

3.33 

4.66 

60  

1.63 

2.11 

3.13 

4.39 

1.74 

2.23 

3.26 

4.58 

70  

1.61 

2.09 

3.09 

4.33 

1.70 

2.19 

3.21 

4.51 

80  

1.59 

2.06 

3.05 

4.28 

1.67 

2.15 

3.17 

4.44 

90  

1.57 

2.04 

3.02 

4.24 

1.65 

2.13 

3.13 

4.39 

100 

1.55 

2.02 

3.00 

4.20 

1.63 

2.10 

3.10 

4.34 

00 

1.32 

1.75 

2.62 

3.67 

1.32 

1.75 

2.62 

3.67 

Pro 

bability.  £  =  0.99 

Pro 

bability 

.£  =  0.999 

Sample 

Size 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

2.19 

2.76 

4.01 

5.67 

2.60 

3.25 

4.67 

6.60 

40  

2.07 

2.61 

3.80 

5.35 

2.40 

3.04 

4.34 

6.08 

50  

1.99 

2.52 

3.67 

5.15 

2.27 

2.91 

4.14 

5.75 

60  

1.93 

2.45 

3.58 

5.01 

2.18 

2.81 

3.99 

5.S4 

70  

1.88 

2.40 

3.51 

4.90 

2.11 

2.73 

3.88 

5.38 

80  

1.84 

2.35 

3.44 

4.80 

2.05 

2.67 

3.79 

5.25 

90  

1.81 

2.31 

3.39 

4.73 

1.99 

2.61 

3.72 

5.13 

100  

1.78 

2.27 

3.34 

4.66 

1.95 

2.56 

3.65 

5.05 

1.32 

1.75 

2.62 

3.67 

1.32 

1.75 

2.62 

3.67 
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Skew  =  0.40 


Skew=0.60 


Sample 
Size 

Probability,  £  =  0.90 

Probability,  £  =  0.95 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

1.86 

2.45 

3.79 

5.58 

2.01 

2.64 

4.08 

6.00 

40  

1.78 

2.36 

3.66 

5.42 

1.91 

2.52 

3.90 

5.74 

50  

1.73 

2.31 

3.58 

5.30 

1.85 

2.44 

3.78 

5.58 

60  

1.69 

2.27 

3.52 

5.22 

1.80 

2.39 

3.70 

5.44 

70  

1.66 

2.24 

3.47 

5.15 

1.77 

2.35 

3.63 

5.36 

80  

1.64 

2.21 

3.43 

5.09 

1.74 

2.31 

3.57 

5.28 

90  

1.62 

2.19 

3.39 

5.04 

1.71 

2.28 

3.53 

5.22 

100  

1.60 

2.17 

3.36 

5.00 

1.69 

2.25 

3.48 

5.15 

00 

1.34 

1.84 

2.89 

4.24 

1.34 

1.84 

2.89 

4.24 

Sample 
Size 

Probability,  £  =  0.99 

Probability,  £  =  0.999 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

2.29 

2.99 

4.57 

6.80 

2.70 

3.50 

5.27 

7.90 

40  

2.16 

2.83 

4.33 

6.41 

2.51 

3.28 

4.91 

7.30 

50  

2.08 

2.73 

4.18 

6.17 

2.39 

3.15 

4.68 

6.92 

60  

2.02 

2.66 

4.08 

6.00 

2.31 

3.06 

4.53 

6.66 

70  

1.97 

2.59 

3.99 

5.85 

2.23 

2.98 

4.41 

6.46 

80  

1.92 

2.54 

3.92 

5.73 

2.17 

2.91 

4.32 

6.29 

90  

1.89 

2.50 

3.86 

5.63 

2.12 

2.85 

4.24 

6.15 

100  

1.85 

2.45 

3.80 

5.54 

2.07 

2.80 

4.17 

6.02 

00 

1.34 

1.84 

2.89 

4.24 

1.34 

1.84 

2.89 

4.24 

Sample 
Size 


Skew  =  0.60 


Probability,  £  =  0.90 

Probability,  £  =  0.95 

Sample 

Size 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

1.91 

2.61 

4.21 

6.44 

2.07 

2.83 

4.55 

6.92 

40  

1.83 

2.52 

4.07 

6.23 

1.97 

2.71 

4.35 

6.63 

50  

1.78 

2.46 

3.98 

6.10 

1.91 

2.62 

4.22 

6.43 

60  

1.75 

2.42 

3.91 

6.00 

1.86 

2.56 

4.12 

6.30 

70  

1.72 

2.38 

3.85 

5.91 

1.82 

2.51 

4.05 

6.19 

80  

1.69 

2.35 

3.80 

5.84 

1.79 

2.47 

3.99 

6.09 

90  

1.67 

2.32 

3.76 

6.78 

1.76 

2.43 

3.92 

6.01 

100  

1.65 

2.29 

3.72 

5.72 

1.74 

2.40 

3.88 

5.94 

00 

1.34 

1.91 

3.15 

4.82 

1.34 

1.91 

3.15 

4.82 

Probability,  £  =  0.99 


Proportion 


Probability,  £  =  0.999 


Proportion 


0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

2.38 

3.21 

5.11 

7.92 

2.81 

3.75 

5.86 

9.14 

40  

2.24 

3.04 

4.84 

7.45 

2.62 

3.51 

5.50 

8.45 

50  

2.16 

2.94 

4.68 

7.16 

2.50 

3.36 

5.27 

8.02 

60  

2.10 

2.86 

4.56 

6.95 

2.41 

3.26 

5.11 

7.72 

70  

2.04 

2.79 

4.45 

6.78 

2.33 

3.17 

4.97 

7.49 

80  

2.00 

2.73 

4.37 

6.64 

2.27 

3.10 

4.87 

7.29 

90  

1.95 

2.68 

4.30 

6.52 

2.21 

3.03 

4.76 

7.12 

100  

1.92 

2.63 

4.23 

6.40 

2.16 

2.98 

4.68 

6.99 

00 

1.34 

1.91 

3.15 

4.82 

1.34 

1.91 

3.15 

4.82 

Skew  =  0.80 

Probability,  £  =  0.90 

Probability,  £  =  0.95 

Sample 

Size 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

1.93 

2.77 

4.65 

7.28 

2.12 

3.01 

5.03 

7.88 

40  

1.86 

2.67 

4.48 

7.06 

2.02 

2.88 

4.80 

7.53 

50  

1.82 

2.60 

4.38 

6.91 

1.96 

2.79 

4.65 

7.31 

60  

1.78 

2.56 

4.30 

6.80 

1.91 

2.72 

4.55 

7.15 

70  

1.75 

2.52 

4.23 

6.70 

1.87 

2.67 

4.46 

7.02 

80  

1.73 

2.48 

4.18 

6.61 

1.83 

2.62 

4.38 

6.90 

90  

1.70 

2.45 

4.13 

6.54 

1.80 

2.58 

4.32 

6.80 

100  

1.68 

2.42 

4.08 

6.46 

1.77 

2.54 

4.26 

6.70 

00 

1.33 

1.96 

3.39 

5.37 

1.33 

1.96 

3.39 

5.37 

Probability,  £  =  0.99 

Probability,  £  =  0.999 

Sample 

Size 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

0.90 

0.95 

0.99 

0.999 

30  

2.48 

3.44 

5.68 

9.02 

2.91 

3.98 

6.44 

10.46 

40  

2.32 

3.26 

5.37 

8.49 

2.72 

3.72 

6.05 

9.65 

50  

2.23 

3.13 

5.18 

8.15 

2.61 

3.56 

5.80 

9.15 

60  

2.15 

3.05 

5.04 

7.90 

2.52 

3.44 

5.63 

8.80 

70  

2.08 

2.97 

4.93 

7.70 

2.43 

3.35 

5.50 

8.53 

80  

2.04 

2.91 

4.83 

7.54 

2.38 

3.26 

5.38 

8.30 

90  

2.00 

2.85 

4.75 

7.40 

2.32 

3.19 

5.28 

8.12 

100  

1.96 

2.80 

4.68 

7.28 

2.27 

3.13 

5.19 

7.95 

cc 

1.33 

1.96 

3.39 

5.37 

1.33 

1.96 

3.39 

5.37 

PROCEEDIN(;S  OF  THE  SYMPOSIUM  ON  STATISTICAL  HYDROLOGY 
Table  I  — Empirical  tolerance  factors  for  a  Pearson  Type  III  distribution  —  Continued 


77 


Skew  =  LOO 


Probability,  £  =  0.90 

Pro 

bability,£  =  0.95 

C  1 

Sample 

Size 

Proportion 

Proportion 

0.90 

0.95 

0.99 

0.999 

n  on 
U.yu 

0.95 

0.99 

0.999 

30  

2.02 

2.91 

5.07 

8.11 

2.20 

3.16 

5.51 

8.79 

40  

1  no 

2.82 

4.89 

7  QA 
/.OO 

2.09 

3.03 

5.25 

50  

1.86 

2.75 

4.77 

7.70 

2.02 

2.94 

5.08 

8.18 

60  

1.82 

2.70 

4.68 

7.58 

1.96 

2.87 

4.97 

8.00 

70  

1.78 

2.65 

4.61 

7.47 

1.92 

2.82 

4.87 

7.85 

80  

1.75 

2.61 

4.54 

7.37 

1.87 

2.76 

4.78 

7.72 

90  

1.72 

2.57 

4.48 

7.28 

1.84 

2.72 

4.71 

7.60 

100  

1.69 

2.54 

4.43 

7.20 

L81 

2.68 

4.65 

7.50 

00 

1.30 

2.00 

3.60 

5.91 

1.30 

2.00 

3.60 

5.91 
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LINEAR   LEAST   SQUARES   PREDICTION   FOR   MULTIVARIATE  TIME 
SERIES  WITH  MISSING  OBSERVATIONS 

ByE.  J.  Gilroy  ' 


Abstract 

Records  of  hydrologic  phenomena,  such  as 
streamflow,  are  often  incomplete  due  to  either 
missing  past  observations  or  missing  present 
observations  at  certain  sites  discontinued  for 
economic  reasons.  If  the  hydrologic  data  observed 
can  be  considered  statistically  independent  over 
time,  the  missing  observations  may  be  estimated 
by  the  methods  of  multivariate  regression.  How- 
ever, in  many  instances  the  hypothesis  of  statistical 
independence  of  the  data  observed  at  different 
points  in  time  is  rejected.  In  these  cases,  the 
ordinary  regression  procedure  for  estimating  the 
missing  observations  is  not  strictly  applicable. 
The  records  at  the  different  observation  sites  may 
then  be  considered  to  be  incomplete  samples  from 
a  multivariate  time  series.  Under  the  assumption  of 
weak  stationarity  of  the  time  series,  the  technique 
of  linear  least  square  prediction  theory  may  be 
applied.  The  fitting  of  various  suitable  models, 
such  as  autoregressive  or  moving  average,  leads  to 
simplified  estimates  of  the  missing  observations. 
The  question  of  the  dimensionaUty  of  the  multi- 
variate time  series  to  be  used  is  a  function  of  the 
increase  of  estimation. 

Introduction 

Annual  streamflow  records  at  several  sites  may 
indicate  the  existence  of  correlation  not  only 
among  flows  at  different  sites  but  also  year-to- 
year  serial  correlation  between  flows  at  a  site.  Also, 
seasonal  streamflow  records  at  one  site  may  in- 
dicate not  only  correlation  between  flows  in  adja- 
cent seasons  in  the  same  year  but  also  year-to-year 
serial  correlation  between  flows  in  the  same  season. 
In  both  of  these  cases,  the  streamflow  data  may  be 


'  U.S.  Geological  Survey,  Water  Resources  Division,  Arling- 
ton, Va. 


considered  to  be  a  sample  from  a  multivariate  time 
series. 

Sometimes  data  at  some  of  the  sites  are  missing 
or  some  of  the  seasonal  flows  may  be  missing  and 
it  is  desired  to  estimate  these  missing  values. 
Another  problem  arises  when  measurements  at 
certain  sites  or  during  certain  seasons  are  discon- 
tinued for  economic  reasons  and  estimates  of  these 
missing  values  have  to  be  made  at  some  future  time. 
A  third  problem  is  that  of  predicting  the  flow  in  the 
next  season  given  the  flows  in  the  previous  season 
of  the  same  year  and  all  the  seasons  for  the  previous 
years. 

Now  if  there  is  no  year-to-year  serial  correlation, 
all  of  these  prediction  problems  reduce  to  ordinary 
multivariate  regression  procedures  as  analyzed 
by  Fiering  (3)  and  Matalas  and  Jacobs  (5).  If 
there  is  no  cross  correlation  between  sites  or  be- 
tween seasons,  then  each  missing  value  must  be 
predicted  from  past  values  at  the  individual  site 
or  for  the  individual  season.  This  type  of  problem 
has  been  considered  in  the  application  of  the  work 
of  Box  and  Jenkins  {1 )  to  hydrologic  data  by 
Carlson,  MacCormick,  and  Watts  (2)  for  the 
univariate  case. 

This  paper  outlines  a  technique  for  estimating 
missing  values  in  a  very  general  case  of  missing 
observations  in  a  sample  from  a  weakly  stationary 
multivariate  time  series.  An  application  of  the 
technique  is  given  for  a  bivariate  first  order  auto- 
regressive model.  The  relative  efficiency  of  the  bi- 
variate predictor  to  the  univariate  predictor  is 
given  for  various  values  of  the  serial  correlations 
of  the  two  series. 

Mathematical  Model 

Let  Z(0  =  {^o(0,  •  •      X,{t)}  for  teT 

=  {0,  ±1,  ±2,  .  .  .}  denote  a  weakly  stationary, 
multivariate  stochastic  sequence.  Assume  that  the 
means  and  the  correlation  structure  of  the  sequence 
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X{t)  are  given  by 

E[Xjit)]=0      ;-0,  1,  .  .  .,p;       teT  (1) 

E[Xj{t)X,{t  +  r)]=Rj,{T)  (2) 

for  j,  k  =  0,  I,  .  .  . ,  p;  t,  TeT.  Although  the  func- 
tions Rjkir)  are  usually  unknown  and  must  be 
estimated,  the  theory  of  least  squares  estimation 
used  here  assumes  the  functions  Rjkir)  to  be 
known.  The  effect  of  this  assumption  will  be  dis- 
cussed later.  The  general  theory  of  linear  least 
squares  estimation  in  weakly  stationary  time  series 
is  discussed  by  Yaglom  (8,  ch.  4).  It  is  also  assumed 
that  Rjj(0)  =  l,  for 7  =  0,  1,  .  .  .,  p,  which  means 
the  variance  of  each  sequence  is  unity.  No  as- 
sumption concerning  the  probability  law  describing 
the  behavior  of  the  random  variables  A",  (f)  is  made. 

For  each  y  =  0,  1,  .  .  .,  p,  let  Tjcrdenote  a  subset 
of  T  with 

Tj  =  {tji,  tj2,  ,  tj„.  }. 

The  subset  Tj  denotes  the  set  of  time  points  at 
which  Xj{t)  has  been  observed.  Let  t'eT  —  To  so 
that  the  sequence  Xoit)  has  not  been  observed  at 
the  time  point  t' . 

Based  on  the  observations  of  the  stochastic 
sequences  Xj{t)  at  time  points  teTj  and  j  =  0, 
1,  .  .  .,  p,  several  different  estimators  of  the  value 
of  X(]{t)  for  t  ~  t'  could  be  obtained.  The  two  con- 
sidered in  this  paper  will  be  the  univeriate  predictor 

Xoit)  and  the  multivariate  predictor  Xo{t).  Both 
the  predictors  will  be  linear  functions  of  the  observa- 
tions. The  univariate  predictor,  Xi){t),  will  be  a 
linear  function  of  the  observations  of  Xn{t),  teT) 

only.  The  multivariate  predictor,  A'o(t),  will  be  a 
linear  function  of  all  the  observations  of  A'j(0  for 
teT;  j  =0,  1,  .  .  .,  p. 

The  following  matrix  notation  is  introduced.  Let: 


Xj(tj)) 
Xjitj,) 


for  j  =  0,  1, 


x= 


Xu 

Xi 


A—{ai,a2,  .  ■  .,  a„o] 


Bj=[.bj,,.^,bj,,.^,  .  .  .,6,,,. 

for7  =  0,  1, 


and 


B—  [fio,  Bi,  .  .  . ,  fip] 


Then  the  predictors  A'o(f')  and  Xo{t')  may  be 
written  as 


X^{t')^AXo 


(3) 


X,{t')  =  BX  (4) 
The  errors  of  prediction  are  denoted  by 

ey=X,{t')-X,{t')  (5) 


e2  =  X,{t')-X,{t'). 


(6) 


Each  of  the  random  variables  e\  and  e-z  has  an 
associated  variance  called  the  variance  of  predic- 
tion, denoted  by  cr^(ei)  and  cr?(<'2),  respectively. 

Let: 

R,  =  E[X,{t')Xl-]  (7) 

R.  =  E[X,){t')Xn  i8) 
S,,  =  E[X,Xl]  19) 

Then  o^(^i)  and  tH(<'j)  may  be  written  as 
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aHe-,)  =  l-  R^B^  -BRI  +  BS^^B''.  (12) 

The  matrices  of  coefficients  A  and  B  are  found 
by  minimizing  or-(ei)  with  respect  to  A  and  0-^(62) 
with  respect  to  B  giving 


estimates  of  the  matrices  /?o,  Rx,  Sqo,  Sxx.  Define 
the  estimators  XoW)  and  Xo{t')  by 


A  =  /?oS-i 


B  =  RxSz] 


(13)  Let 


(14) 


assuming  the  nonsingularity  of  the  matrices  Soo 
and  Sxx.  Substituting  these  values  for  A  and  B  into 

the  equations  for  Xo{t')  andX)(f')  gives  the  linear 
least  squares  or  minimum  variance  estimators 


Xi){t')  —RoSf.}Xo 


Xn{t')=RxS-lX. 


(15) 


(16) 


Let  cr^  and  cr^  denote  the  values  of  a^{ei)  and 
cr^ieo)  whenA^o(^')  a.ndXo{t')  are  so  defined.  Then: 

al=mincTHe,)  =  l-RoS-^^Rl 
cr2=mino-2(e2)  =  l-«xS-i/?^.  (18) 


As  is  well  known  from  the  general  theory  of 
multivariate  statistical  analysis  (see  Rao  (7,  p. 
220)),  it  is  always  true  that: 

al^crl  (19) 

How  much  less  than  cr^  is  cr^  depends  on  the 
correlation  structure  of  the  (p+ l)-dimensional 
stochastic  process  {Xo{t),  Xi{t),  .  .  .,  Xp{t)), 
teT. 

Now  the  estimators Zo(«')  andJfo(^')  given  above 
were  derived  under  the  assumption  that  the  cor- 
relation matrices  Ro,  Rx,  Soo,  Sxx  were  known. 
Only  rarely  will  it  be  true  that  the  elements  of  these 
matrices  are  known.  Rather,  these  elements  must 
be  estimated  from  the  same  sample  upon  which  the 

estimators  A'o(t')  andA^o(«')  are  based. 

Let  Ro,  Rx,  Soo,  Sxx  denote  ordinary  sample 


Xo{t' )  — RoSi),)Xo 
Xoit' )  =  RxSj-lX. 

d,=Xo{t')-Xoit') 

d,^Xo{t')-Xo{t') 


(20) 
(21) 
(22) 
(23) 


denote  the  errors  of  Xo{t' )  and  Xo{t' ) ,  respec- 
tively. Then  the  variance,  Sf,  or  di  is  not  equal  to 
erf  of  equation  17  and  the  variance,  S|,  of  d-z  is  not 
equal  to  cr.j  of  equation  18.  In  general,  the  variances 
of  d]  and  d-z  will  be  very  difficult  to  evaluate  ana- 
lytically for  any  assumed  underlying  probability 
law  describing  the  variation  of  the  Xj{t).  Perhaps 
some  results  could  be  obtained  under  the  assump- 
tion of  a  multivariate  autoregressive  model  as 
done  by  Matalas  {4).  Until  some  results,  either 
analytical  or  simulated,  are  obtained,  estimates  of 
the  variances  S'(  and  S.|  could  be  formed  by  substi- 
tuting sample  estimates  of  the  matrices  Ro,  Rx, 
Soo,  Sxx  into  equations  17  and  18.  Thus,  Sf  and  S.j, 
the  variances  of  di  and  d>,  respectively,  could  be 
estimated  by 

Si=l-RoSJRl  (24) 


Si^l-RxS^],Rl. 


(25) 


The  estimation  of  the  elements  of  the  correlation 
matrices  in  equations  24  and  25  presents  serious 
difficulties  since  some  of  the  correlations  will  be 
for  lags  as  long  as  the  sample  itself  and  thus  have 
a  large  sampling  variance.  Again,  one  possible  way 
to  avoid  this  problem  would  be  to  assume  the  multi- 
dimensional sequence  follows  some  relatively  simple 
model,  such  as  a  multivariate  autoregressive  model, 
and  then  estimate  the  small  number  of  parameters 
involved  in  the  model.  The  pertinent  correlations 
would  be  functions  of  the  model  parameters.  Hence, 
these  correlations  could  be  estimated  by  substi- 
tuting the  estimates  of  the  model  parameters  into 
the  appropriate  functions. 


Application 

The  above  method  is  apphed  to  a  bivariate  lag  one 
autoregressive  model.  For  teT,  let  A!^o(0  and^i(0 
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be  defined  by  the  following  recursion  equations: 
X,it+  I)  =  pMt)  +  {I  -  plV'eoit  +  1)  (26) 
X,it+1)=  p.Xdt)  +  (1  -pf)'-e,(«+  1)  (27) 

with 

EU,(t}]=EU^{t)]=0  (28) 

1,5-0 


E[€o{t)eo{t-\-s)]=E[eAt)€At  +  s)] 


0,5  7^0 

(29) 


E[eM)^i{t  +  S)]  = 


0,5  ?^0. 

(30) 


The  relevant  moments  of  the  two  sequences 
^0(0  and  Xi  (t)  are 

E[X.{t)]=E[XAt)]  =  0  (31) 

£[Zo(OX,(f  +  5)]=p;;l  (32) 

E[XAt)XAt  +  s)]^p[^^  (33) 


E[XM)XAt  +  s)]^\^^''^^^^  (34) 
^PPu's  ^  0 


Xo{t)  andA'i(f)  could  represent  a  two-season  model 
of  streamflows  where  Xoit)  is  the  flow  for  season 
zero  in  year  t  and  Xi{t)  the  flow  for  season  1  in 
year  t. 

Suppose  that  the  followinj:  sample  has  been 
observed: 

X.(l),^o(2).  .  .  .,X,{n-l) 

;V.(1).^,(2)  XAn-\).X,(n) 

and  it  is  desired  to  estimate  A(i(«)-  If  only  the  ob- 
served values  of  A'o(f)  are  to  be  used,  the  linear 
least  scpiares  theory  yields  the  estimator 


If  po  must  be  estimated,  then  the  estimate  iiiXiAn) 
is  given  by: 


X»in)  —  rnXiiin  —  1 ) 


(36) 


where  To  is  the  ordinary  estimate  of  the  lag  one 
serial  correlation  coefficient. 

If  all  the  n  —  1  observations  of  A'()(^)  and  the  n 
observations  of  A'i(f)  are  to  be  used  to  estimate 
Xt){n).  the  linear  least  squares  theory  yields  the 
estimate 

Xn(n)=  pnXn  ( n  —  1 ) 


+  ^^^l_pf'^  U.(n)-p.^,(^-l)]  (37) 


If  Po,  Pi.  p  are  to  be  estimated,  then  an  estimate  of 
X{n)  is  given  by 


X(){n)  —  FiiXuin  —  1 ) 
_^  r(l  —  r„r,) 

Let 


1  -r? 


[X,(n)-nX,(n-l}].  (38) 


so  that 


e^=Xo(n)-Xo{n)  (39) 

ez^X,An)-X,{n)  (40i 

o-2  =  Var  (e,)  =  l-pf,  (41) 

-i-Var(e.)  =  l-p5-^^^^^^^-  (42) 

As  a  measure  ot  the  additional  inlitrmalion  con- 

tained  in  the  estimator  A'o(n) .  define  the  relative 
efficiency.  £",  of.Yo(n)  to.\o(n)  as 


E  = 


a-f  -  o-.T 


(43) 


Then  from  equations  41  and  42.  otjuation  4^^ 
becomes 


X»{n)  ^  p»Xo(n  —  1 ) 


(35) 


E  = 


p-(l  -  popi)- 
(l-p;;)(l-pf) 


(44) 
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If  p,  po.  pi  must  be  estimated,  take  as  an  estimate 
of  E  the  quantity 


(l-rf,)(l-r2) 


(45) 


In  order  that  the  correlation  matrices  of  the 
sequences  Xoit) ,  Xiit)  be  consistent,  certain 
relations  must  exist  between  the  parameters 
r,  To,  ri.  Matalas  and  Wallis  (6)  have  investigated 
these  relations,  and  one  form  they  developed  is  that 


O^rMl-rori)-^  ^  (46) 


or 


(47) 


Hence,  given  values  of  ro  and  ri  must  satisfy  the 
constraint. 


r- 


(l-rg)(l-r2) 


Let 


O(ro,  ri)  = 


(l-rori)2 
(l-rpd-rp 


(48) 


(49) 


To  obtain  some  idea  of  how  much  information  is 
added  by  the  estimator  Zn(n),  it  is  useful  to  see 
how  the  quantity  ^{r,),  r,)  varies  while  r  is  fixed 
and  satisfies  the  constraint  (eq.  48).  If  (t>(ri),  ri)  is 
written  as 


4>(ro,  r,)  =  l  + 


(ro-r,)2 


(l-rg)(l-rf) 
it  is  easily  seen  that: 
<I>(r„,  r,)  =<I)(-ro,-ri) 

4)(r»,  -ri)=(I)(-ro,  n) 

and: 

a>(/-o,  r,)  >  1. 


(50) 

(51) 
(52) 

(53) 


The  following  tabulation  shows  how  ^(ro,  r,)  varies 
with  ro  and  ri: 


To 


-0.8 
-.6.. 
-A.. 
-.2.. 
.0.... 
.2.... 
.4.... 
.6.... 
.8.... 


0 


2.77 
L56 
L19 
L04 
LOO 
1.04 
1.19 
1.56 
2.77 


0.2 


3.89 
2.04 
1.45 
1.17 
1.04 
1.00 
1.05 
1.26 
2.04 


0.4 


5.76 
2.86 
1.91 
1.45 
1.19 
1.05 
1.00 
1.07 
1.53 


0.6 


9.51 
4.51 
2.86 
2.04 
1.56 
1.26 
1.07 
1.00 
1.17 


0.8 


20.7 
9.51 
5.76 
3.89 
2.77 
2.04 
1.53 
1.17 
1.00 


For  more  complicated  models,  the  term  describ- 
ing the  reduction  in  variance  would  become  much 
more  complicated. 

Summary  and  Conclusions 

A  general  procedure  is  given  for  estimating  miss- 
ing values  in  one  component  time  series  of  a  mul- 
tivariate time  series.  The  procedure  is  based  on  the 
classical  Hnear  least  squares  principle,  which  uses 
the  criterion  of  minimum  variance  of  estimation 
error  for  choosing  the  estimator.  All  the  pertinent 
correlations  are  assumed  known,  in  the  procedure. 
When  the  correlations  are  not  known,  sample  values 
must  be  used  in  the  equation  defining  the  estimator. 
This  new  estimator  will  no  longer  be  a  minimum 
variance  estimator  but  will  be  an  estimate  of  the 
minimum  variance  estimator.  The  sampling  variance 
of  the  new  estimator  was  not  derived  but  was  esti- 
mated by  substituting  sample  values  of  the  cor- 
relations into  the  equation  for  the  variance  of  the 
minimum  variance  estimator. 

In  order  to  use  this  procedure,  it  is  necessary  to 
assume  a  model  for  the  multivariate  time  series  such 
as  a  multivariate  autoregressive  model.  One  dif- 
ficulty here  would  be  determining  the  appropriate 
multivariate  model  for  the  given  hydrologic  data. 
Perhaps  the  methods  described  by  Carlson,  Mac- 
Cormick  and  Watts  (2)  could  be  adapted  to  the 
multivariate  case. 
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A  MODEL  OF  STATISTICAL  ERRORS  IN  ESTIMATING  STREAMFLOW 

FROM  CONTINUOUS-STAGE  RECORDS 

By  R.  B.  Thomas  ' 


Abstract 

Determining  discharge  from  continuous-stage 
streamflow  records  is  subject  to  several  sources  of 
error.  When  a  power  function  is  used  as  a  rating 
model,  one  error  source  is  estimating  parameters 
for  the  model.  The  hydrograph  is  divided  into  a 
series  of  "intervals"  by  selecting  data  points  from 
it.  If  the  hydrograph  is  assumed  to  be  linear  during 
each  interval,  discharge  could  be  calculated  if  the 
rating  model  parameters  were  known.  The  errors 
in  flow  as  a  result  of  using  estimates  of  these 
parameters  are  discussed.  Approximate  estimators 
for  bias  and  standard  error  in  estimating  interval 
and  daily  flows  are  developed.  Consequences  of  the 
model  and  results  of  applying  these  estimators  on 
actual  streamflow  records  are  reported. 

Introduction 

Studies  of  land  management  practices  on  forested 
watersheds  depend  heavily  on  measurement  of 
streamflow.  To  assess  the  validity  of  the  results 
of  such  studies,  it  is  necessary  to  know  the  mag- 
nitude of  errors  inherent  in  the  measurement 
process.  Such  knowledge  would  be  particularly 
helpful  if  the  expected  changes  in  streamflow  pro- 
duced by  an  experiment  are  slight  and  could  be  the 
same  order  of  magnitude  as  the  errors  themselves. 

Sources  of  error  in  determining  streamflow  dis- 
charge are  numerous.  In  1967,  Dickinson  ^  dis- 
cussed these  errors  extensively  and  included  a  large 
bibliography  of  previous  work.  He  has  also  de- 
veloped an  error  model  for  discharge  based  on  the 
assumption  that  mean  daily  stage  is  known  virtually 
without  error. 

An  alternative  method  of  calculating  discharge 
would  be  to  collect  continuous  records  of  stage  over 


•Forest  Service,  USDA,  stationed  in  Berkeley,  Calif. 
2  Dickinson,  W.  T.  accuracy  of  discharge  determinations. 
Colo.  State  Univ.  Hydrol.  Paper  20,  54  pp.,  iUus.  1%7. 


time.  (The  records  can  include  punched  paper  tape 
as  well  as  ink  stripcharts  if  readout  intervals  are 
short  enough  to  assure  that  no  significant  breaks 
are  missed.)  The  data  are  summarized  by  selecting 
time-stage  pairs  from  the  hydrograph  at  abrupt 
changes  in  slope.  These  "breakpoints"  serve  to 
divide  the  hydrograph  into  a  series  of  "intervals" 
of  unequal  length.  During  each  interval  the  trace 
can  be  considered  linear.  Flow  for  each  interval  is 
calculated  by  using  a  formula  with  parameters 
estimated  by  "rating"  each  stream-gaging  station. 

A  commonly  used  model  to  rate  stream-gaging 
stations  is: 

q=  a()(s,.  — 5())"i  (1) 

where  q  is  the  discharge;  5 1 .  the  virtual  (or  observed) 
stage;  and  So,  ao,  and  ai  are  model  parameters  to  be 
determined  for  each  stream-gagjng  situation. 

The  parameters  can  be  estimated  using  a  set  of 
sample  pairs  (5,  q).  The  values  of  q  are  found  for  a 
range  of  values  of  s  by  measuring  stream  velocities 
with  a  current  meter  at  various  positions  in  the 
stream  cross  section  according  to  standard  prac- 
tices.^ Such  a  model  is  commonly  used  to  describe 
the  rating  function  (or  portions  of  it)  for  natural 
channels  as  well  as  certain  artificial  controls.  There 
may  be  similar  models  for  different  ranges  of  the 
rating,  each  model  having  its  own  parameters. 

Under  the  foregoing  conditions,  an  equation  can 
be  developed  that  computes  "exact"  flow  volume 
during  each  interval.  The  calculated  volumes  are 
correct  only  if  the  assumptions  of  the  model  are  met, 
and  the  stages,  times,  and  model  parameters  5o, 


^  Buchanan,  Thomas  J.,  and  Somers,  William  P.  DISCHARGE 
measurements  at  gaging  STATIONS.  In  Techniques  of  Water- 
Resources  Investigations  of  the  United  States  Geological  Survey. 
Book  3.  Applications  of  Hydraulics.  U.S.  Geol.  Survey.  65  pp.. 
illus.  1969. 
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ao,  and  tti  are  known  without  error.  Of  course,  in 
practice,  they  are  not. 

This  paper  describes  a  simplified  model  in  which 
it  is  assumed  tliat  the  form  of  the  rating  function 
is  correct  and  that  the  stages  and  times  are  measured 
with  neghfdble  error.  In  this  model,  errors  in  estimat- 
ing flow  volumes  are  due  solely  to  sample  variation 
of  the  set  of  (s,  q)  pairs  used  for  estimating  param- 
eters and  to  the  propagation  of  the  resultant  errors 
through  the  equation  for  flow  during  an  interval. 
That  is,  for  any  given  interval,  the  estimated  flow 
will  depend  only  on  the  values  of  the  model  param- 
eters estimated  from  the  set  of  (s,  q)  pairs  col- 
lected. Approximate  estimators  are  given  for  the 
bias  and  variance  involved  in  estimating  interval 
flow.  These  estimators  are  then  used  to  develop 
error  expressions  for  daily  flows.  Preliminary  results 
of  applying  these  estimates  on  actual  streamflow 
records  are  reported. 


Rating  or  Calibration 

If  a  rating  model  of  the  form  given  in  equation 
1  is  postulated,  it  is  necessary  to  verify  this  assump- 
tion and  to  estimate  the  values  of  So,  ao,  «i.  This  is 
done  by  collecting  a  set  of  pairs  (s,  q)  which  span, 
to  the  extent  practicable,  the  range  of  flows  to  be 
expected  in  field  measurements. 

The  set  of  pairs  is  obtained  by  measuring  stage 
with  a  staff  gage  or  recording  device  while  also 
measuring  a  series  of  stream  velocities  at  selected 
points  in  the  stream  cross  section.  From  the  data  on 
velocity  and  position,  an  estimate  of  q  can  be 
calculated.^ 

Although  both  s  and  q  are  known  with  error, 
measurement  of  5  is  subject  to  more  direct  pro- 
cedures, which  make  its  determination  much  more 
certain.  The  value  of  q.  however,  depends  on  many 
indirect  measurements,  such  as  the  dimensions  of 
the  channel  cross  section  and  the  positions  of  the 
current  velocity  meter  as  well  as  the  peculiarities 
to  which  a  current  meter  is  prone.  These  problems 
result  in  much  less  certainty  in  measuring  llu' 
discharge. 

The  assumption  usualU  made  (implicitlv  or  other- 
wise) is  that  5  is  known  exactly  and  that  all  error  is 


involved  in  measuring  q.  This  assumption  is  made 
here. 

Define  the  actual  or  true  stage  as 


5      S I  So- 

Substituting  in  equation  1, 

q  =  a()S°". 


(2) 


(3) 


The  value  of  s„  under  some  circumstances  can  be 
considered  a  statistical  artifact  which  assists  in 
validating  the  transformation  given  below.  Under 
other  conditions,  5,,  is  the  value  of  Sj  for  zero  dis- 
charge. Since  5,  is  measured,  finding  5  requires 
knowing  the  value  of  Su-  This  quantity  can  be  esti- 
mated by  graphical,  numerical,  or  analytical 
methods.^  For  the  preliminary  model,  it  was  as- 
sumed that  s„  is  known.  This  assumption  reduces 
the  model  in  equation  1  to  the  form  of  equation  3. 

A  logarithmic  transformation  of  equation  3  often 
allows  a  satisfactory  linear  regression  fit  to  the  set 
of  stage-discharge  pairs.  This  transformation  gives 


In  g=  In  a()  +  «!  In  5 
which  is  of  the  form, 


(4) 


(5) 


by  letting. 


and. 


y=  In  q 
x—\ns 
/3(i  =  In  tto 
/3,  =  rt, 

Equation  5  suggests  the  regression  inodel 
y,  =  /3o  +  /3iari  +  c,. 


(6) 


(7) 


where  the  1  subscript  dciintcs  the  rollcited  [umiiIs. 


See  footnote  3. 


'  Carter.  R.  W..  and  Davidian.  J.  OK^CHARGE  ratings  at  GAG- 
ING STATIONS.  In  Surlaoe  Water  Techniques.  Book  1.  HvdrauUc 
Measuren\enis  and  Compulation.  L  .S.  C-eolojiioal  Survey, 
36  pp..  iUus.  1%,S. 
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and  is  a  random  variable  expressing  the  error 
involved  in  measuring  y,.  It  should  be  remembered 
that  this  regression  model  assumes 


and 


Var  (€,)  =  o-2 


(8) 


for  all  i.  That  is,  the  true  variance  in  In  q  is  assumed 
constant  throughout  the  range  of  In  s.  This  means 
the  variance  is  assumed  proportional  to  discharge. 

From  this  model,  standard  linear  regression 
theory  yields  unbiased  estimates  of  ^o,  their 
variances,  and  covariance.  By  use  of  the  last  two 
equations  of  "equation  6"  estimate 


wnere 


5,-1 


(11) 


is  its  slope  and  6,,  its  intercept. 

The  function  in  equation  10  represents  the 
stage  hydrograph.  Substituting  this  function  in 
equation  3  gives  a  representation  of  the  discharge 
hydrograph  for  the  ith  interval.  The  integral  of  the 
discharge  hydrograph  over  the  interval  gives  the 
total  flow.  That  is,  letting/,  denote  the  flow  volume 
for  the  j'th  interval 


dt 


(12) 


and 


(9) 


where  the  hats  denote  appropriate  estimated 
quantities. 


Monitoring  a  Stream 

Once  a  stream-gaging  station  has  been  calibrated, 
the  stage  can  be  monitored  over  time  and  flow 
volumes  and  mean  discharges  for  various  time  per- 
iods can  be  calculated.  To  do  so  with  time-stage 
pairs  collected  at  breakpoints  on  the  hydrograph 
trace,  an  expression  must  be  developed  giving  flow 
volume  during  an  interval.  This  expression  is  based 
on  the  form  of  the  rating  function. 

Consider  selecting  n+l  time-stage  pairs  from 
a  daily  hydrograph.  One  such  point  is  always 
selected  at  the  beginning  and  one  at  the  end  of  the 
day,  and  one  is  taken  at  each  of  the  n  —  1  break- 
points in  between.  This  results  in  n  intervals  during 
the  day. 

Denote  the  points  by  (t,,  5,),  i  =  0,  1,  .  .  .  n 
where  t,  is  the  time  and  Sj  the  stage.  Then  the  ith 
intierval  begins  at  time  and  ends  at  time  t,. 
Because  of  the  method  of  point  selection,  for  the 
jth  interval  ti]  we  have  the  linear  function 


or,  if  the  aforementioned  substitution  is  made, 

fi=  [     animit  +  bi)"'dt.  (13) 
•^',-1 

To  evaluate  the  integral  in  equation  13  it  is  neces- 
sary to  distinguish  two  cases.  First  suppose 
7^  Si.  This  will  be  referred  to  as  a  nonzero  slope 
interval.  Evaluating  equation  13,  and  using  equation 
10  to  express  the  resulting  flow  in  terms  of  stage, 
gives 

^'  =  m,iaUl)  [^r-'-et-].^^-»^^^  (14) 

A  zero  slope  interval  has  5,-i  =  5i,  implying  that 
nii  =  0  so  equation  13  becomes 


f. 


=  aosf  j  ' 
•'  ti-\ 


dt 


which  gives 

fi=  UttS'f'  («,  —  fi-i  )  ,  5,-1  =  5, 


(15) 


(16) 


5  —  mit  +  bi 


(10) 


In  practice,  an  and  cx.\  are  not  known,  so  the 
estimated  values  must  be  used  in  equations  14 
and  16.  Substituting  in  terms  of  the  regression 
estimates  given  in  equation  9  and  denoting  the 
estimate  of  /;  by  fi  results  in 
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/n,(i3,  +  1) 


(5i^'  +  '  -5fi|'  )  '       5,-1  5^  Si 


Si-1  =  Si. 


(17) 


Errors  in  Flow  Estimates 

Define  the  random  variable  w,  so  that 

fi=fi  +  COi  . 


(18) 


Since  the  random  variables/So  and/3i  in  equations  17 
appear  in  the  denominator  and  in  powers,  £'(/,  )  7^  fi. 
Then,  from  equation  18, 


E(oji)=E(f,}  -  fi 


(19) 


<aves  a  measure  of  bias  in  thetth  interval.  Also,  from 
equation  18 


var  (tui)  =  var  (/) 


(20) 


is  a  measure  of  variation  in  the  flow  estimates  due  to 
variation  in  estimating  )8o  and  /3i  and  the  effects  of 
equations  17. 

The  next  two  sections  discuss  £'(a>,)  and  var 
(a»,  )  for  interval  and  daily  flows.  For  convenience,  a 
summary  of  ail  assumptions  in  the  model  is  given 
first. 

(1)  A  functional  relationship  exists  between  the 
discharge  and  stage  of  the  form  given  in 
ecjuation  1. 

(2)  The  value  of  the  virtual  stage  for  zero 
discharge  is  known. 

(3)  During  the  rating  procedure,  s  is  assumed 
measured  without  error  so  that  the  variance 
in  the  observed  value  of  In  q  is  due  solely  to 
errors  in  its  measurement. 

(4)  The  mean  of  the  measurement  errors  in  \n  q 
is  zero  and  its  variance  is  a-  (unknown), 
which  is  constant  over  the  range  in  In  5. 

(5)  The  rating  remains  the  same  during  the 
monitoring  period. 

(6)  During  monitoring.  and  t  are  measured 
without  error. 

(7)  The  stage  hydrograph  between  adjacent 
points  (t,s)  is  linear. 


Errors  in  Interval  Flow 

All  estimates  developed  in  this  paper  are  second 
order  approximations  based  on  expansion  of  equa- 
tions 17  in  powers  of  SjSo  and      .  where 


and 


S/3o  =  /3„  -  /So 
S/3,  -iS,  -/3,. 


(21) 


Since  )8()  and  are  unbiased  estimators,  this  gives 
results  in  terms  of  /So,  /Si,  var  (/So),  var  (/3i).  and 
cov  (/So, /3i).  Unbiased  estimators  for  all  these 
quantities  are  given  by  the  regression. 

Bias  in  an  Interval 

We  begin  by  expanding  the  first  of  equations  17. 
For  convenience,  define 


,/3o 


gi  = 


sf'^', i  =  0,l,  .  .  .,n  (22) 


()8,  +  l) 

so  that  the  first  of  equations  17  can  be  expressed  as 


^  1 

=  —  (gi-gi-t)- 
mi 


(23) 


Equation  22  can  be  written  as 

g.  =  e0"-i^"^i^»{fit  +  1  -  /3,-f/3,)-ie(tf.  +  1  -  /I.  +  <J.)in*i 

(24) 

or  as 

gi  =  Ksi  '  +'^•(1  +  8/3, /(l+/S,))-'e'«'^»^«'*''"*.' 

(25) 


if  we  let 


A:  =  e^'V(l  +/8,). 


(26) 


The  last  two  factors  of  equation  23  (the  only  ones 
containing  random  variables)  can  be  written  in 
powers  of  8/Si  using  familiar  series  expansions. 

The  reciprocal  quaiititv  will  be  expanded  accord- 
ing to  the  form 

(1  +  .r)     =  1  -  .t  +      -x^+  .  .  .  (27) 
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which  requires  x"^  <  1  for  convergence.  In  the 
present  case,  this  means  having 


1  <  dfij(l  +  l3i)  <  1. 


(28) 


Assuming  /3i  >  0,  which  is  true  in  any  nontrivial 

case,  the  left-hand  inequality  becomes 


/3.  >-l. 


(29) 


This  can  easily  be  checked.  The  right-hand 
inequality  in  equation  28  reduces  to 


)8,  <  2^1  +  1. 


(30) 


Since  /3i  is  not  known,  we  can  not  be  certain  that 
equation  30  is  true  in  any  specific  case.  Experience, 
however,  provides  a  guide  to  the  magnitude  of  )8i, 
and  unless  our  estimate  is  a  very  unlikely  one,  it 
will  be  less  than  one  more  than  twice  the  true  value. 

Equation  25  can  now  be  written  as 


1  +  (8/3o  +  8)8,  In  Si)  + 


(8^0  +  8/3,  In  5;) 2 


Taking  the  expected  value  of  equation  33  gives 


1 


E{gi)  (1  +1  var  {(5o) 

+  cov  (/§o,  /3i)  fin  Si  - 


+  var  (Pi )  I  7^  .  ,  ^  + 


1  + 
1 


l  +  )8,     (l  +  /8)^ 


(34) 


From  equation  23 


(31)  and 


E{fi)  =  ^  (Eigi)  -Eigi-i)).  (35) 


Using  equation  34  in  35  gives  an  expression  for 

£(/0  =  f;((l  +  var  ()3o)/2)  di, 

+  COV  (i8o,/3,)4>i,  + var  (/3i)<I>„+  .  .  .), 

(36) 


where  an  economy  of  notation  has  been  gained  by 
defining 

6lo  =  5;+^'(ln  5i)J-5i+f'(ln  5i-,)J  , 

<I>n-e,i-6»,o/(l  +  /3,)  (37) 


Ks^+l^'    1  +  (8/8,,+  8/3,  In  5/)  - 


8i3, 


1  +/8, 


(8^0+8^,  In  5,)^  8/3. 

^         ^1  Y+Jl  (°P"  +  ^/^i  1" 


+ 


8/3, 


1  +  /3, 


+  . 


(32) 


Equation  32  reduces  to 
gi  =  Ksl  +  '3.  (^1  +  8po  +  8/3,  (in  Si  -  YTJ^ 
+  |8/3^  +  8/3o8/8,  (in  s^-^^^j^ 

2  1  +  ^,  '   (1  +  /3:) 


y-^.,  / (in  5,)-       In  5j    ,  1 


(33) 


4>i2  =  4>;2/2  - 0,-,/(l  +  iS, )  +  M(l  + 

Equation  36  holds  when  5,-,  s,.  Applying  equation 
36  to  19,  an  estimate  of  the  bias  for  the  ith  interval 
becomes 


K. 

E(oji)=—   (var  (^S,,)  ea./2  +  cov  {^o,  ^8, )  O,-, 
rrii 


+  var  (/3,)  $i2  +  .  .  .)  (38) 


for  the  case  where  5;-,  7^5,-.  In  practice,  since  the 
true  values  of  /3o,  /3,,  and  the  variances  and  covari- 
ance  of  and  /3,  are  not  known,  their  estimated 
values  must  be  used  in  equation  38. 

The  second  of  equations  17  must  be  expanded  to 
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derive  an  expression  for  the  bias  in  a  zero  slope 
interval.  It  can  be  written 

/  =(ti-  ti^t)e'^^'sf'e^l^<'e^i^' "i  ■  (39) 

Expanding  the  last  two  factors  of  e(juation  39  and 
multiplying,  gives 

/•=  (fi-«i_,)e^*"sf.  (^l+(6/3o  +  a/3,  Ins,) 

/SAl                        8/3?  (In5,)^    \  \ 
+  (^  + 8/3.8/3 Jn  5,+  -^^^-^^   I  ■  ■  7  ^"^^^ 

which  has  the  expected  value 

E(fi)  =  {h-ti^i)el'"sf'  (\  +  ^  var(i3o) 

+  COV  (/3o,/3,)  ln5,  +  ^  var  (^,)(ln5,)'+  .  • 

(41) 

Applied  to  equation  19,  equation  41  gives 

E{(Oi)^{ti-ti^i)el'"sl''  var(/3o) 

+  COV  (jSo.^S,)  ln5i+ ^  var  (/3,)(ln5,)2+  .  .  .) 

(42) 

as  an  estimate  for  the  bias  in  the  ilh  interval  for 
the  case  where  5i_)  =  s,. 

Vari€ince  in  an  Interval 

Variances  for  both  interval  types  developed  in 
this  section  are  based  on  the  equation 

\ar  (x)  ^  E(x-')  -  {E{x)y^.  (43) 

For  the  nonzero  slope  interval,  substitution  of 
equation  33  into  equation  23  and  using  the  abbre- 
viations in  equation  37  results  in 

fi  =  —  {Oio-^  (8/3o(y,o  +  8/3,a>n) 

+  (8/3f,e,o/2  +  fi/3o8|8,(I>n  4-  ^(3i<t>r.)  +  .  .  . ) .  (44) 


Squaring  equation  44  gives 

/?=  (^io  +  2(8/3o0fo  +  8/3,ei„^n) 

+  (8/32/?2„  +  28/3o8/3,9,o<D,,  +  28)320,o<J>,2) 

+  (8/3iiefo+  8)8i<I>f,  4-  28/3o8/3,9,„<l)i,)  ^  •  •  .). 

(45) 

The  expected  value  of  equation  45  can  be  re- 
duced to 

^(y;.^)=(^)'  ((/;i,(l  +  2var(/3o)) 

+  var  (/3,)(20,o^i2  +  <Df,) 

+  4cov  (^,/3,)0,o(I)„+  .  .  .).  (46) 

The  expected  value  of  equation  44  is 
K 

E  {/,)  =  —  {Bu,+  (var(i8o)« 

+  var(/3,)(I>,2+cov(^o.)8,)(I>„)  +  .  .  .).  (47) 

To  the  second  order  of  approximation,  the  square 
of  equation  47  is 

[f(/)]'^=(^)'  (0^,+  var(A.)^,^, 

+  2  var  (/3,)(?jo<I>i.  +  2  cov  (/8o.i8,)<^jo<I>ii+  •  •  .)• 

(48> 

Substituting  ecpiations  46  and  18  into  equation  4^^ 
gives 

va.(./;)  =  (f;)\var(A.)t»,^. 

+  var  (/3,)av-;,  +  2  cov  (/8o./3,)^^,o<^„+  •  •  •)  l4^>l 

uliich  applies  to  non/.t-ro  slope  intervals. 

Vhc  variaiu  e  for  a  /tMo  sKqu-  inlcr\.il  is  calculated 
in  .m  (MitircU  similar  manniM.  I  sing  the  expected 
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value  of  the  square  of  equation  40  and  the  square 
of  equation  41  in  equation  43  gives 

var  (/,)-  (ti-fi_i)V5°sf '{var  i$,)  +  \ar  {$,) 

(ln5,)2  +  2cov(^o,  )8,)  ln5i  +  .  .  .)  (50) 

which  expresses  the  variance  in  flow  for  a  zero  slope 
interval. 

Errors  in  Daily  Flows 

From  the  work  already  done,  it  was  a  simple 
matter  to  estimate  the  bias  in  flow  volume  for  an 
entire  day.  Let 

^=  i/'  (51) 

i-1 

and 

i/'  (52) 

i-l 

denote,  respectively,  the  "true"  and  estimated 
daily  flows.  These  quantities  are  related  by  the 
random  variable  Cl  such  that 

F  =  F  +  Ct.  (53) 

Substituting  equations  51  and  52  into  equation  53, 
solving  for  Cl,  and  taking  expected  value  results  in 

E{n)=Ej^f;-^f>.  (54) 

This  can  be  written 

E(a)  =  2(E{fi)-fi),  (55) 
which,  by  equation  19,  gives 

E{a)=2E{aji).  (56) 

i 

Thus,  the  bias  in  daily  flow  volume  can  be  estimated 
by  adding  the  estimated  biases,  computed  as  ap- 
propriate from  equation  38  or  42,  for  each  interval 
throughout  the  day.  Bias  in  mean  daily  flow  can  now 


be  easily  estimated  by  dividing  by  the  appropriate 
units. 

Estimating  the  variance  in  daily  flow  volume 
estimates  is  not  as  convenient.  Note  that 

^ar  {F)=E{F-EiF))'  (57) 
which,  by  equations  51  and  52,  can  be  written  as 

var(F)=f  {2  (/i -£(/,))  (58) 

which  can  be  partitioned  in  the  form 

var  ih-Ey  {fi-ECfi)Y 
1  =  1 

+  2£y    2    {h-Eifd)ifj-E{fj))  (59) 
1=1  j=i+i 

but  this  can  be  written 

var  (F)  -  y  var  (/,)  +  2  V    V    cov  (/,-,  /j). 
i = 1  1 = 1  j  =  i + 1 

(60) 

Equation  60  shows  that  we  need  estimators  for 
covariance  between  all  pairs  of  interval  flows 
during  a  day.  Expressions  for  the  variance  were 
developed  in  the  previous  section.  We  now  turn  to 
the  development  of  covariances  between  interval 
flows. 

Covariance  Between  Interval  Flows 

Equations  for  the  covariance  between  flows  in 
any  two  intervals  are  different,  depending  on 
whether  both  are  nonzero  slope,  zero  slope,  or  one 
of  each  type.  Forms  for  the  three  different  cases 
are  all  developed  from  the  general  equation 

cov  {x,Y)=E{xy)  -E(x)Eiy).  (61) 

First  consider  the  case  where  both  the  ith  and 
7th  intervals  are  of  nonzero  slope.  Using  equation 
44.  the  product  of  J]  and  fj  can  be  written  as 
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fifj=  —  {0,o6p+ei„{8Mp  +  8(3,<t>jO 

+  #jo(8/3„6*,„  +  8/3,cI)„) 

+  du^(8fti6jJ2  +  8/3o8)3,<Dj,  +  dfif^j.,) 

+  BjoiSp-^dJ-I  +  8/3„S/3,(D„  4-  S/Si'd),., ) 

+  8/82<?,o6»j„+8/3„8/3,6>,oCD,, 

+  8/3o8/3,6»jo(D„  +  8y8?(I)n(I>j,+  .  .  .).  (62) 

Taking  the  expected  value  of  equation  62  and  sim- 
plifying results  in 

E{fifj)  =   (0,„0j„(l  +  2var  (^„)) 

+  2  GOV  (/3o./3,)^^,u4>,i 

+  2cov  (/3o./3,)^jo<t)a 

+  var  (y3,)(ei„(I)j2  +  «jo(I>,2 

+  <Da<I>ji)+  .  .  .).  (63) 

From  equation  36  we  can  write 

a:- 

E(fi)E(fj)=  (O„Oj,,+  0i„{^dr  ifi„)0jJ2 

niiin  j 

+  COV  (/3»,)8,)cl)j,  +  var  (/3, )  <I)j2 ) 

+  6>,„(var  (/32)6i,.,/2  +  cov  (yS,,, /3, )  O/, 

+  var  (y3,  )<!),,.)  +  .  .  .),  (64) 

which,  when  subtracted  from  ecjuation  63  gives 

K' 

cov  (/,■,//)  =  (  var  (/5o)  6>,„6>jo 

miTrij 

+  COV  ()3o./8,)(^^,oft)j,  +  «jo(l>,,) 

+  var  (yS,  )(!>,, (I>,i+  .  .  .).  (65) 

Form  of  ecjuation  65  applies  when  both  intervals 
have  nonzero  slope. 

Development  of  expressions  for  the  covariance 
between  intervals  for  the  other  two  situations  is 
done  in  an  analogous  way  using  ecpiations  36.  40. 
41,  and  44  as  appropriate.  For  the  case  where  both 
intervals  are  o{  zero  slope,  this  results  in 


cov  {fijj)=  (t^-ti.,)(tj-tj^,)e^''(siSi)»' 
(var  (/3o )  +  var  (/3i )  In  5,  In  Sj 
+  COV  (/3o,/3,)  ln5,sj+  .  .  .).  (66) 

Lastly 

cov  {f.Jj)=   (tj-tj-i)  (var  (/§o)0,o 

+  var  (/3,)#„  In  sj 

+  COV  {0o,$i)i<t>ii  +  en,insj)+  .  .  .). 

(67) 

for  the  case  where  the  ith  interval  has  nonzero  and 
the  y'th  has  zero  slope. 

Results 

A  computer  program  has  been  written  to  calcu- 
late values  of  bias  and  variance  according  to 
equations  reported  in  this  paper.  Several  conse- 
quences resulting  from  the  model  were  noticed  and 
investigated  during  the  program  checkout. 

One  of  these  was  that  all  values  of  bias  for  all 
data  used  (both  real  and  contrived)  were  positive. 
It  seemed  plausible  that  there  would  be  intervals 
with  properties  which  would  give  negative  bias.  An 
investigation  of  equations  38  and  42.  however, 
revealed  that  negative  bias  is  not  possible  under  the 
model  assumed  in  this  paper.  The  proof  of  this  fact 
is  not  difficult,  but  is  tedious,  so  only  an  outline 
will  be  given  here. 

Consider  equation  38  as  two  factors.  The  first. 
Klnij.  will  be  positive  or  negative  as  the  interval's 
slope  is  positive  or  negative,  respectively.  Then  for 
the  bias  to  be  always  positive  we  need  ti>  show  that 
the  second  factor  shown  bracketed  in  equation  38 
always  has  the  same  sign  as  the  first.  This  condition 
can  be  shown  by  writing  the  bracketed  quantity 
as  the  difference  between  two  identical  functions, 
say  h(s).  Then  equation  38  can  be  expressed  as 

£(o>,)  =  —  (/i(s,)  -  A(5,_,))  .  (68) 
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We  want  to  show  that  h{s)  is  increasing  mono- 
tonically.  To  do  so,  form  h' {s) ,  and  note  that 
h' (l)  >  0.  That  is,  at  least  one  value  of  h' (s)  is 
positive.  Since  h' (s)  is  continuous  for  5  >  0,  h' (s) 
must  have  real  roots  for  any  values  to  be  negative. 
Setting  h' (s)  =  0  and  simplifying  the  equation 
results  in  a  quadratic  in  In  5.  Substituting  the 
standard  regression  estimates  for  var  (/8o) ,  var  (jSi) , 
and  GOV  (^o,  /8i)  and  requiring  the  discriminant 
of  the  quadratic  to  be  positive  result  in 

jc2  ^  (69) 
n 

where  the  xis  are  the  rating  points  of  equations  6 
and  7.  Cauchy's  inequality,  however,  contradicts 
equation  69  except  in  the  case  of  equality.  Equality 
requires  all  rating  points  to  be  taken  at  the  same 
stage,  so  for  any  nontrivial  case  the  discriminant 
will  be  negative,  and  no  real  roots  exist.  Therefore, 
-h{s)  is  a  monotonically  increasing  function  when 
5  >  0,  so  that 

nii  >  0  :^  h{si)  -  h(si-i)  >  0, 

and 

nii  <  0^  h{si)  -  h{si-i)  <  0.  (70) 

For  a  zero  slope  interval,  we  consider  equation 
42.  The  first  three  factors  are  always  positive.  The 
bracketed  quantity  can  be  considered  a  continuous 
quadratic  in  In  s  for  5  >  0.  Again,  if  5,  =  1  the  quan- 
tity is  positive,  so  if  the  quadratic  is  to  be  negative, 
it  must  have  real  roots.  Substituting  the  variance 
and  covariance  estimates  into  the  bracketed  factor 
in  equation  42,  forming  the  discriminant  of  the 
quadratic,  and  requiring  it  to  be  nonnegative  results 
in  the  identical  contradiction  as  in  the  nonzero 
slope  case. 

Thus,  under  the  model  postulated,  all  biases  will 
be  positive  for  any  nontrivial  case.  This  condition 
removes  the  possibility  of  the  total  bias  being 
lowered  owing  to  compensating  effects  of  long  term 
records. 

Another  effect  of  the  model  is  that  most  co- 
variances  between  interval  flows  are  positive. 
Again,  one  would  expect  some  to  be  positive  and 
others  to  be  negative.  Intuitively,  a  negative  co- 


variance  between  two  interval  flows  means  that  as 
either  of  or  /3i  vary  in  one  direction,  one  flow 
rises  as  the  other  drops.  It  is  clear  from  either  of 
equations  17  that  as  /3o  rises  so  does  ft  regardless 
of  the  values  of  5;  and  Si-i.  Furthermore,  for  the 
case  where  5,_i  =  5i,  /  rises  for  rising  )8i  if  Si  >  1 
and  falls  for  rising  if  5,  <  1.  It  can  be  shown  for 
the  case  where  s,-i  ^  5,  that  a  similar  situation  exists 
if  both  starting  and  ending  stages  are  above  or  be- 
low one.  For  intervals  where  one  stage  is  above 
one  and  the  other  below  one,  the  direction  of  vari- 
ation depends  on  the  magnitude  of  the  stages. 

Thus,  the  covariance  between  flows  during  in- 
tervals having  stages  above  one  will  always  be 
positive.  Covariances  between  intervals  with  the 
several  other  combinations  of  stages  in  relation  to 
one  may  be  either  positive  or  negative,  but  for  the 
actual  data  processed  so  far,  none  of  the  covari- 
ances was  negative.  It  was  possible  to  contrive  a 
situation  which  produced  negative  covariances. 

Application  of  Formulas  to  Actual  Data 

To  illustrate  the  effects  of  applying  the  equations 
to  real  data,  sample  days  were  selected  from  two 
streams  — one  in  Colorado,  the  other  in  California. 
One  of  the  streams  was  the  South  Fork  of  Lake 
Creek  in  central  Colorado.  An  artificial  control 
monitors  drainage  from  this  alpine  watershed  of 
7.300  acres.  The  control  is  an  adaptation  of  a  de- 
sign discussed  by  Brown. The  hydrograph  is  re- 
corded on  punched  paper  tape  with  a  readout 
interval  of  15  minutes.  Before  use.  all  superfluous 
points  — that  is,  those  that  did  not  alter  the  shape 
of  the  hydrograph  — were  removed. 

The  other  stream  was  Rush  Creek  in  California's 
southern  Sierra  Nevada.  A  chart  recorder  in  ,a 
natural  channel  there  monitors  drainage  from  a 
10.885-acre  watershed. 

Data  for  Lake  Creek  during  3  days  in  1968  were 
processed  (table  1).  Both  the  bias  and  standard 
deviation  were  expressed  as  a  percentage  of  the 
calculated  flow. 

The  1968  rating  for  Lake  Creek  consisted  of  21 
points  from  0.79  to  2.015  feet  of  stage.  The  data 


Brown,  Harry  E.  COMBINED  CONTROL-METERING  SECTION  FOR 
GAGING  LARGE  STREAMS.  Water  Resources  Res.  5(4):  888-894, 
1969. 
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Table  I.— Errors  in  daily  flow^  for  three  selected 
days  at  Lake  Creek, 


Date 

Range  in 
stage^ 

Bias 

Standard 
error 

Feet 

Percent 

Percent 

May  4  

0.70-0.84 

0.06 

3.31 

May  29  

L50-1.79 

0.02 

2.03 

June  14  

L  74-2.00 

0.03 

2.57 

'  Expressed  as  percent  of  estimated  flow. 
2  Range  of  rating  stages  is  0.79  to  2.02  feet. 


Table  2.  — Regression  statistics  for  Rush  Creek- 
real  and  generated  data 


Item 

Field  data 

Sample  38 

Sample  39 

F  statistic  

50.3 

304.8 

32.8 

Significance 

probability  

0.00623 

<  0.001 

0.00254 

Standard  error  of 

estimate  

0.211 

0.106 

0.357 

Number  of 

observations  

5 

8 

8 

closely  fit  the  model  as  indicated  by  an  F  statistic 
of  3238  and  standard  error  of  estimate  of  0.0676. 
The  significance  probability  is  <  0.001.  The  values 
of  percent  standard  error  are  quite  low  and  those  of 
bias  are  far  less  than  1  percent:  however,  the  errors 
are  larger  for  those  days  with  stages  close  to  the 
extremities  of  the  span  of  rating  points  than  for  other 
days.  This  pattern  was  evident  for  all  data  run. 

These  data  contrast  with  those  from  Rush  Creek 
for  five  consecutive  days  in  1971  (fig.  1;  tables  2,  3). 
The  five  rating  points  have  the  range  shown  along 
the  left  side  of  the  graph  (fig.  1).  The  standard  errors 
from  the  Rush  Creek  data  were  much  higher  than 
those  from  the  Lake  Creek  data.  Errors  were 
greatest  for  May  26  and  May  28.  which  have  values 
of  stage  closest  to  the  ends  of  the  rating  span. 

To  show  the  effect  of  the  rating  regression  fit 
on  the  estimated  biases  and  standard  errors.  40 
samples  of  eight  rating  points  each  were  generated. 
The  samples  were  generated  about  the  regression 


2.00 


May  26  27  28  29  30 

Figure  l.-Rush  Creek  iivdrograpbic.  19()8. 


Table  'i.  — Percent  errors  in  daily  flow  for  five 
consecutive  days  in  Rush  Creek  for  field  and 
generated  data.  May  1971 


Date 

Field  data 

Sample  38 

Sample  39 

Bias 

Standard 
error 

Bias 

Standard 
error 

Bias 

Standard 
error 

26  

0.77 

12.40 

0.18 

6.06 

2.08 

20.37 

27  

.68 

10.07 

.10 

3.75 

1.12 

12.64 

28  

1.01 

13.60 

.13 

4.68 

1.43 

15.85 

29  

.45 

9.45 

.08 

4.05 

.94 

13.62 

30  

.48 

9.78 

.10 

4.47 

1.13 

15.04 

line  resulting  from  the  field  data,  using  a  random 
normal  generator  with  variance  equal  to  the  regres- 
sion estimate.  The  eight  points  were  spaced  equally 
and  covered  the  same  range  as  the  field  data. 

Results  of  using  two  of  these  rating  sets  with 
the  Rush  Creek  hydrograph  data  are  shown  in 
table  3.  The  sets  were  selected  to  illustrate  the 
extremes  of  fit  of  the  sample  data  sets  abt)Ut  their 
regression  lines.  Sample  38  fits  quite  well,  with  a 
small  significance  probability  and  standard  error 
of  estimate.  The  fit  for  sample  39  was  not  nearly  as 
good,  a  fact  reflected  in  larger  errors.  Even  though 
the  fit  was  poor,  it  would  be  considered  satisfactory 
by  commonly  used  criteria. 

These  results  are  not  unexpected  in  a  qualitative 
sense.  We  expect  better  estimates  from  an  artificial 
control,  such  as  Lake  Creek,  where  the  rating  points 
lie  close  to  the  regression  line.  Additional  Si>iirccs 
of  variance  in  the  natural  control  in  Rush  Creek 
resulted  in  much  larger  bias  and  standard  error. 

riiis  coniparisiHi  is  confounded  by  diflerences 
in  the  ranges  as  well  as  the  numbers  of  rating 
points  in  the  two  streams.  Studv  of  the  equations 
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estimating  variances  and  covariance  for  the  /3's 
shows  that  their  values  drop  as  the  numbers  of 
points  and  the  distance  of  the  independent  variable 
from  its  mean  increase.  Considering  equal  size 
sets  of  rating  points  taken  for  the  same  stream  at 
the  same  stages,  the  bias  and  variance  in  any  in- 
terval are  directly  related  to  the  square  of  the 
standard  error  of  estimate.  This  relationship  can 
be  seen  by  substituting  equations  for  estimating 
var(j8o),  var(/3i),  and  cov(/3o,)8i)  into  equations 
38,  42,  49,  and  50. 

The  biases  for  Rush  Creek  are  larger  than  those 
for  Lake  Creek,  but  they  appear  small  in  an  absolute 
sense.  The  magnitude  of  the  bias  must  be  con- 
sidered, however,  in  relation  to  the  objectives  of 
each  individual  study. 

Although  these  results  give  some  idea  of  error 
magnitudes,  they  depend  greatly  on  each  case.  Each 
situation  should  be  investigated  individually  with 
regard  to  the  specific  problems  and  goals  involved. 

Effect  of  Interval  Type  and  Location 

It  might  be  supposed  that  an  interval's  slope  or 
length  would  play  a  part  in  its  expected  errors.  In 
all  data  run.  the  interval  characteristics  affected 
errors  only  by  influencing  the  interval's  closeness 
to  the^  extremes  of  the  range  of  rating  points.  That 
is.  an  interval,  regardless  of  length  or  slope,  will 
tend  to  have  small  errors  as  long  as  its  end  points 
lie  close  to  the  center  of  this  range,  and  the  errors 
increase  as  the  interval  approaches  the  extremes. 
Intervals  with  steep  slopes  will  tend  to  have  larger 
errors  only  because  such  slopes  must  have  their 
end  points  closer  to  the  extremes. 

This  effect  is  shown  in  figures  2  and  3.  A  test 
hydrograph  for  1  day  was  constructed  with  constant 
slope  so  that  its  range  of  stages  exceeds  that  used 
in  the  rating  process.  The  day  was  divided  into  16 
intervals  of  IV2  hours  each.  Results  of  running  this 
hydrograph  with  the  Lake  Creek  rating  data  are 
shown  in  figure  2  and  with  the  Rush  Creek  data  in 
figure  3.  Errors  calculated  for  each  interval  are 
plotted  in  the  graph  directly  above.  The  span  of 
rating  points  is  indicated  in  each  case. 

For  both  examples,  the  errors  increase  toward 
the  extremes  of  the  rating  span,  but  for  the  Rush 
Creek  data  it  is  much  more  pronounced.  Errors  for 
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Figure  2.  — Comparison  of  errors  and  range  of  rating  stages 
using  artificial  hydrograph.  Lake  Creek,  1968. 

Lake  Creek,  while  rising  toward  the  extremes,  re- 
main quite  low  even  outside  the  span. 

These  errors  are  expressed  as  percent  of  flow. 
The  bias  in  a  day's  flow  is  an  average  of  the  percent 
bias  of  each  component  interval  weighted  by  the 
respective  interval  flows.  Thus,  the  bias  and  stand- 
ard error  for  Rush  Creek  are  greatest  for  May  28 
even  though  the  graph  (fig.  1)  for  the  latter  part  of 
the  day  is  in  the  center  of  the  rating  span.  This 
effect  is  due  to  the  higher  flow  being  nearer  to  the 
upper  extreme. 

Summary 

One  method  of  estimating  streamflow  from  con- 
tinuous-stage records  is  to  partition  the  record  into 
intervals  defined  by  abrupt  changes  in  hydrograph 
slope.  Flow  for  each  interval  can  then  be  calculated 
by  using  an  "exact"  equation  and  parameters  can 
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be  estimated  by  "rating"  the  stream.  The  statistical 
errors  involved  in  such  a  method  are  considered  by 
developing  a|)|)roximate  estimators  of  bias  and 
variance  for  interval  flows.  Other  equations  are 
derived  to  estimate  values  of  these  errors  for  daily 
flows. 

The  estimators  result  from  a  model  which 
assumes  that  the  variation  in  flows  estimated  for 
an  interval  results  solely  from  errors  in  estimating 
the  parameters  in  the  rating  equation  and  the 
propagation  of  these  errors  through  the  function 
expressing  flow  during  an  interval. 

A  computer  program  was  used  to  calculate 
values  of  these  estimators  for  field  records.  Data 
from  selected  days  on  two  streams  were  run  with 
the  program  to  illustrate  error  properties  and  mag- 


FlGURE  3.  — Comparison  of  errors  and  range  of  rating  stages 
using  artificial  hydrograph.  Rush  Creek,  1971. 


nitudes.  One  stream  had  an  artificial  and  the  other 
a  natural  control. 

For  all  data  run.  values  of  bias  were  positive. 
Subsecjuent  investigation  showed  that  this  is  a 
|)roperty  of  the  model  as  postulated;  that  is.  re- 
gardless of  the  parameters  of  an  interval,  the 
model  — on  the  average  — always  overestimates  the 
How.  This  condition  removes  the  possibility  of 
biases  for  different  interval  types  cancelling  over 
long-term  records. 

Although  always  positive,  the  values  of  bias  for 
the  real  data  were  also  small  — mostly  less  than  1 
percent  of  the  estimated  flow.  Values  of  standard 
error  were  larger,  however,  ranging  up  to  13  percent 
of  estimated  flow. 

Two  major  factors  influence  the  magnitude  of  both 
bias  and  standard  error.  The  first  is  the  fit  of  the 
points  used  in  rating  the  stream  about  its  regression 
line.  Greater  standard  error  of  estimate  result- 
ing from  poorer  fit  of  rating  points  for  the  natural 
control  was  reflected  in  larger  errors.  This  effect 
was  also  illustrated  by  using  generated  data.  Two 
equal  size  sets  of  rating  points  having  widely  dif- 
fering standard  errors  of  estimate  were  selected. 
These  sets,  in  turn,  were  run  with  the  hydrograph 
data  from  the  natural  control.  The  set  of  data  with 
the  lower  standard  error  of  estimate  had  consider- 
ably lower  errors.  This  test  removed  the  effect  of 
size  of  the  set  of  rating  |)oints.  which  confounded  the 
results  observed  between  the  two  different  streams. 

The  second  factor  influencing  error  magnitude 
was  that  flows  toward  the  extremes  of  the  range  ot 
stages  used  in  rating  the  stream  had  larger  errors. 
This  difference  emphasizes  the  importance  of 
collecting  a  set  of  rating  points  which  adequately 
span  the  range  of  stages  which  will  be  of  most 
concern  for  a  particular  objective. 

The  model  described  in  this  paper  is  useful  lor 
analyzing  errors  in  continuous  stream  flow  records 
when  data  are  selected  at  intervals.  But  more  ex- 
perience must  be  gained  before  general  results  can 
be  reporteil.  Such  experience  will  result  from  using 
the  model  on  streamflow  records  aiquired  under 
widely  diflering  rating  londitions.  One  such  con- 
dition would  be  one  in  which  the  rating  futu  tion 
cannot  be  adequately  represented  by  a  single  linear 
function.  The  model  may  prove  helpful  in  separat- 
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ing  rating  points  to  give  the  optimal  rating  function 
in  this  situation. 

The  model  should  be  expanded  to  account  for 
more  sources  of  error  if  the  results  show  they  are 
significant.  These  would  include  errors  in  reading 
stages  and  times,  selecting  breakpoints,  and  de- 
termining the  virtual  stage  for  zero  flow. 


Results  of  such  studies  would  dictate  the  feasi- 
bility of  applying  corrections  for  bias  in  estimating 
flows.  They  should  also  suggest  procedures  in  the 
stream  measuring  process  which  will  reduce  both 
the  bias  and  uncertainty  in  flow  estimates  to  a 
minimum. 


FLOOD  ESTIMATION  IN  THE  PRESENCE  OF  OUTLIERS 

By  W.  Kirby ' 


Introduction,  Summary,  and 
Conclusions 

Frequency  analysis  of  flood  discharge  records 
often  is  complicated  by  the  presence  of  one  or  more 
observations  which  He  far  above  the  other  obser- 
vations in  the  record.  We  use  the  term  "outher"  to 
denote  those  large  observations  which  survive  tests 
for  the  rejection  of  spurious  observations,  yet  stiU 
are  so  far  out  as  to  be  considered  extremely  unlikely 
events.  Inclusion  of  such  outUers  in  the  ordinary 
estimates  of  the  mean,  standard  deviation,  and  other 
statistical  parameters  thus  might  badly  distort 
these  estimates.  On  the  other  hand,  because  the 
outhers  could  not  be  rejected  as  spurious,  their 
exclusion  might  distort  the  estimates  in  the  other 
direction.  Such  outUers  thus  impede  the  process- 
ing of  statistical  data  on  flood  discharges  by  neces- 
sitating professional  interpretation  of  all  results 
and  subjective  adjustments  of  those  results  aff'ected 
by  outUers.  This  constitutes  the  so-called  outlier 
problem  of  flood  statistics. 

This  outher  problem  possibly  may  be  resolved 
by  using  a  weighted  average  of  estimates  ob- 
tained with  and  without  the  outlier.  The  relative 
weights  to  be  put  on  the  two  estimates  would  be  a 
function  of  the  distance  between  the  outlier  and 
the  sample  mean,  measured  in  units  of  the  sample 
standard  deviation.  This  general  strategy  would  be 
appUed  to  estimates  of  all  statistical  parameters 
and  possibly  might  be  adapted  to  multiple-outlier 
problems  as  well.  The  weighting  functions  to  be 
used  in  these  several  cases  presumably  would 
depend  on  such  factors  as  the  sample  size,  the 
parameter  being  estimated,  and  possibly  (but 
hopefully  not)  the  underlying  population  from  which 
the  sample  was  assumed  to  be  drawn. 

This  report  presents  the  theoretical  derivation 
of  a  weight  function  minimizing  the  mean  square 
error  (MSE)  of  the   parameter  being  estimated. 
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This  analysis  applies  to  a  large  class  of  parameters 
(including  the  population  mean  plus  a  specified 
number  of  standard  deviations)  and  arbitrary 
sample  sizes  and  underlying  populations.  The  mean 
square  error  of  the  resulting  weighted  estimate  is 
less  than  that  of  either  of  its  constituent  estimates. 
As  theoretical  analysis  reveals  neither  the  magni- 
tude of  this  improvement  nor  the  shape  of  the 
weight  function,  the  basic  theoretical  results  are 
further  analyzed  and  recast  in  a  form  amenable  to 
numerical  computation.  The  computational  for- 
mulas are  summarized  at  the  end  of  the  report. 

Preliminary  computations  have  been  carried  out 
for  estimates  of  the  parameter  f)  =  fjL  +  k(j  (the  mean 
plus  a  specified  number  of  standard  deviations). 
The  resulting  optimal  weight  function  seems  to 
depend  primarily  on  the  sample  size,  to  a  lesser 
degree  on  the  skew  of  the  underlying  population,  and 
hardly  at  all  on  the  higher  population  moments  or 
the  value  of  the  specified  constant  k.  These  compu- 
tations indicated,  in  addition,  that  for  samples  with 
relatively  large  outliers  the  optimal  (weighted)  esti- 
mate had  substantially  smaller  mean  square  errors 
than  either  the  full-sample  or  censored-sample  esti- 
mates. All  three  estimates  were  essentially  equiva- 
lent for  samples  without  large  outliers. 

To  summarize  the  results  of  these  computations, 
we  fitted  simple  straight  line  segments  to  the  com- 
puted optimal  weight  functions,  .\lthough  the  fit  was 
hardly  satisfying,  we  hoped  that  it  would  yield  a 
near-optimal  weight  function  in  the  sense  of  mean 
scjuare  error  of  estimation.  Straightforward  simula- 
tion using  the  known  sample  size  and  sample  esti- 
mates of  the  skew  coefficient  to  determine  the 
weights  indicates  that  this  approximate  weight  func- 
tion's performance  is  just  as  good  as  the  originally 
com[)uted  weight  function's.  .\s  before,  the  weighted 
estimate  has  about  the  same  root  mean  square  error 
(RMSE)  as  the  unweighted  estimates  for  samples 
without  large  outliers  and  for  the  2  year  return 
period.  On  the  other  hand,  for  estimates  of  the  10- 
year  and  UK)  year  tlooiis  based  on  tlio  10  percent 
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or  so  of  samples  with  large  outliers,  use  of  the  pro- 
posed weighted  estimate  instead  of  the  unweighted 
full-sample  estimate  cuts  the  RMSE  approximately 
in  half  —  the  absolute  magnitude  of  the  improvement 
being  as  much  as  2.5  standard  deviations  of  the 
underlying  population. 

In  conclusion,  then,  it  appears  that  even  a  crudely 
constructed  weighted  average  of  full-sample  and 
censored-sample  estimates  is  substantially  more 
accurate  than  either  of  its  constituent  estimates 
when  the  sample  contains  a  large  outlier  and  is  of 
equivalent  accuracy  when  the  sample  does  not  con- 
tain an  outlier.  Therefore,  such  an  estimator  is 
routinely  applicable  to  a  larger  class  of  samples  than 
the  oridinary  full-sample  estimate  and  should  effect 
a  significant  reduction  of  the  outlier  problem. 


average  of  0  and  9.  The  statistics  x  and  cr  are  the 
ordinary  estimates  of  the  mean  and  standard  devia- 
tion: 


1 


N- 


"2  (•^'-^)^ 


(5) 


(6) 


The  statistics  x  and  cr  are  computed  similarly,  but 
from  the  sample  of  size  A'^  —  1  obtained  by  dropping 
the  largest  observation  from  x.  In  computing  the 
estimate  6,  the  relative  weights  to  be  put  on  6 
and  6  are  given  by  a  function  a  of  the  statistic 


Mathematical  Formulation 


/  -'"max 
r{x)  =  iz  


(7) 


A  random  sample  x  —  {xi,  .  .  .,  jc.v)  of  size 
N  is  given.  This  sample  is  a  particular  realization 
of  the  A'-dimensional  random  vector  X—  (Xi, 
.  .  .,  X\),  where  the  Xi  are  independent  random 
variables  all  having  the  same  unknown  probability 
distribution  Fo.  It  is  required  to  use  this  random 
sample  to  estimate  an  unknown  parameter  6 
of  the  underlying  distribution  Fo.  Of  particular 
interest  is  the  parameter 

d  =  ix+Ka-  (1) 

where  k  is  a  given  constant  and  (jl  and  cr  are  the 
unknown  mean  and  standard  deviation  of  the 
underlying  distribution  Fo. 

We  consider  three  estimates  of  this  parameter 

0  =  x  +  Kd-  (2) 

d^X+K&  (3) 

d^e-a(r(x))(e-e}.  (4) 

The  estimate  6  is  based  on  the  whole  sample,  while 
6  is  obtained  by  deleting  the  largest  observation 
from  the  sample.  The  estimate  ^  is  a  weighted 


which  measures  the  distance  between  the  top 
observation  and  the  bulk  of  the  sample.  In  practical 
applications,  the  weight  function  a  would  be  given; 
it  is  the  purpose  of  the  present  study  to  find  a  good 
weight  function  to  use  in  practice. 

Selection  of  the  Weight  Function 

The  weight  function  (/  is  chosen  to  minimize 
the  MSE  of  the  estimator  6.  For  any  choice  of  a, 
the  MSE  is  the  expected  value  (with  respect  to 
the  underlying  distribution  Fo)  of  the  squared 
difference  between  the  population  parameter  6  and 
the  estimate  9,,  obtained  from  a  random  sample  X 
drawn  from  Fo.  Expressing  the  MSE  as  a  function  of 
the  weighting  function  a,  we  have 

MSEia)  =E{9AX)-9)'' 

=  [  F{dx){9„{x)-9)' 

=  I  .  F(dx)[9{x)-9 

-air{x)){9(x)-9(x))y  (8) 
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where        is  the  space  of  /V-dirnensional  samples 

x=  (.t|  x\)    and   /'((/.»)   is  the  i)r<)bability 

that  a  random  sami)]e  X  drawn  from  will  fall 
in  an  infinitesimal  cell  (Lx  containinfi  the  jjoint  x. 

The  minimizing  fz-function  now  must  he  found. 

This  problem  may  be  greatly  simplified  and 
solved  by  elementary  calculus  by  an  appropriate 
choice  of  the  family  of  candidate  ^/-functions.  We 
first  decide  that  we  will  consider  only  step  functions 
with  a  finite  number  of  steps.  That  is,  a  is  to  be  of 
the  form 


(i{r)  —  (ii,-       for  r  E  /a{ A  =  1 . 


A) 


(9) 


where  the  «a  are  constants  to  be  determined  and 
the  /a  are  intervals  which  conveniently  partition 
the  range  of  [jossible  r  values.  It  will  be  seen  below 
that  for  any  specified  sample  size  1\.  the  range  of 
possible  r-values  is  finite.  Thus,  almost  any  function 
could  be  approximated  as  closely  as  desired  by  a 
step  function  based  on  a  suitably  fine  finite  partition. 
We  second  decide  that  if  it  makes  MSE(«)  smaller, 
we  will  accept  OA-values  which  are  not  proper 
weights,  that  is.  (ii.-  values  outside  the  interval 

oTY. 

The  first  of  these  assumptions  effectively  con- 
verts MSE  (a)  into  a  function  of  L  real  variables 
(I  I.  ....  (I  I..  An  explicit  representation  in  terms 
of  the  f/A  is  obtained  as  ff)llows.  Let  Kk  be  that 
collection  of  sample  points  x  having  r-values  in 
the  interval  h.  That  is. 


Kk={x:r(x)  G/a  ; 


(10) 


The  sets  Kk  partition  so  that  MSE(«)  can  be 
written  as  a  (finite)  sum  of  integrals  over  the  Aa-. 
Moreover,    a(r(jf) )  =  r/A    throughout    Aa-.  Thus. 


MSE( 


A-  I 

-2  i I 


{dx)(6-ey' 


F{dx)(e-B){o-e) 


+  S  "a-  f    F(dx)(e-6)\  (12) 

A- 1  * 

The  first  term  in  this  result  is  the  MSE  of  the  ordi- 
nary estimate  Q,  £{6  —  6)'. 

Because  the  oa-  are  not  constrained  to  the  interval 
0,  1,  MSE  (a)  may  be  minimized  by  finding  the 
points  a*  at  which  all  the  partial  derivatives 
ciMSE (a)/^a,  are  zero.  The  partial  derivatives  are 


aMSE(fi) 


--2f  F{dx){S-d){d-d) 


JK 


+  2a,    F{dx)(d-dY.  (13) 


The  sole  stationary  point  of  MSE  is  thus  given  by 
I  F{dx)(6-6)(d-0) 


JKi 


(14) 


F{dx)id-ey' 


A  second  differentiation  of  MSE  shows  that  tlie 
matrix  of  second  partial  derivatives  is  diagonal 
and   nonnegative:   thus   the   stationary  point 
minimizes  MSE. 

The  resulting  MSE  is,  by  substituting  equation  14 
in  equation  12, 

MSE(a*)  =  MSE{e)  =  E(6-ey-- 


MSE(a)  - 

A  =  l 


2: 1^  F(dx)\o-e-<n.{e-H)i^.  (11)  ^  [|^.^/-(</x)(^-^))((J-f^) 


f  F{dx)(e-oy- 

j^A• 


(15) 


A   somewhat    more   convenient   representation  of 
MSE  ill)  is  obtained  by  expanding  the  square  in       Tluis  MSE(^)  is  less  than  MSE((j).  as  would  ho 
the  integrand  and  rearranging  terms.  The  result  is       expect«Ml.  Similarly.  MSK(f^)  is  less  than  MSE(f^). 
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Performance  Measures 

Although  theoretical  analysis  has  shown  that  the 
estimator  6  with  weight  function  a*  has  a  smaller 
MSE  than  either  of  the  alternatives  6  or  6*,  the 
analysis  does  not  reveal  how  much  smaller  this 
error  is.  To  decide  whether  the  proposed  estimator 
6  represents  any  significant  improvement  upon  the 
ordinary  estimator  6,  it  is  necessary  to  measure 
the  statistical  performance  of  both  these  estimators. 

The  pertinent  measures  of  performance  in  this 
case  appear  to  be  the  mean  square  errors  and  biases 
of  the  estimators,  conditioned  on  the  events  that 
r{X)  lies  in  specified  r-intervals,  and  displayed  as 
functions  of  r.  Thus,  for  suitable  r-intervals  the 
performance  measures  to  be  considered  are 


MSE(^|/i) 

=  E[(d{X) 

-ey 

\r{X)eI,] 

MSE(^|/i) 

=  E[idiX) 

-ey 

\r{X)eIi] 

Bias(^|/,) 

-E[{d{X) 

-e)\ 

riX)eIi] 

Bias(^|/i) 

=  E[{d{X) 

-e)\ 

r{X)eI^]. 

(16) 


In  addition,  of  course,  the  optimal  a*  should  be 
found.  Finally,  for  use  in  computing  the  conditional 
expectations  and  as  an  indication  of  the  overall 
importance  of  the  various  r-intervals,  the  un- 
conditional probabihties 


ar^Eiid~e){d-d)iEi{e-e)^ 

F{Ii)=Ei{l) 

MSE{d  \  I  i)=Ei{e- en  Eil 

MSE{d  \ I i)=E Ad- BYlEil 

-[Ei{d-e)Cd-~d)]  ^lEiid-  oYEd 

Bias  {d\Ii)=^Ei{d-e)IEd 
Biais  {e\ I i)=Ei{d~  d)l Ed 

-  Ei{e  -  6)  {6  -B)Ei{6  -e)iEi{e  -eyEd. 

(19) 


This  array  of  expressions  can  be  made  more 
manageable  by  defining  vector-valued  functions 
as  follows: 


{d-ey 
{d-~ey 
Ce-e)Ce-e) 
Li 


(20) 


F{h)=P{r{X)eh) 


[11) 


E\  =  EM 


(21) 


must  be  computed. 

The  definitions  of  all  these  results  involve  in- 
tegrals of  the  general  form 


F{dx)h{x) 


(18) 


Ri  = 


EflE* 

Rf 

El 

Rl 

lE'mv 

12 

[EJIE1- 

{EfyiEfE^yi-^ 

Rf 

E}IEf 

Rt 

.E]IEf- 

EfEyEfEf 

(22) 


where  A  is  a  specified  function.  Thus,  from  the  The  yth  component  of  /?,  corresponds  to  the  ;th 
definitions  of  expectation  and  conditional  expectation       of  equations  19.  except  that  for  j=3  and  y=4  the 
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MSE  has  been  replaced  by  the  more  easily  vis- 
ualized root  mean  square  error  (RMSE). 

Computational  Analysis  of  the 
Integrals  Eih 

To  obtain  the  desired  results  Ri,  integrals  of  the 
form 

Eih  =  f  F{dx)h{x)  (23) 

J  Hi 

must  be  evaluated.  The  integration  is  over  regions 
of  N-dimensional  space  defined  by 

\       /  \     ma\Xj—x  -1 
Ki  =  \x:r{x)=  ^  €/,[.  (24) 

The  integrands  involve  joint  probability  distribu- 
tions F  appropriate  to  flood  magnitudes  and  func- 
tions h  involving  the  sample  means  and  standard 
deviations  of  the  sample  points  x.  It  is  necessary 
to  do  some  preliminary  manipulations  to  put  these 
integrals  into  a  form  suitable  for  explicit  compu- 
tation. 

Two  problems  must  be  resolved.  First,  can  the 
results  for  a  particular  probability  distribution  F 
be  obtained  easily  from  the  results  for  some  related 
distribution  G,  or  must  all  the  work  be  redone 
from  scratch?  For  example,  would  diflFerent  weight 
functions  have  to  be  used  for  normal  distributions 
with  different  means  or  variances,  or  would  the 
same  weight  function  be  optimal  for  all  normal 
distributions?  Second,  how  can  the  implicit  defi- 
nition 24  of  the  regions  Ki  be  converted  into  an 
explicit  procedure  for  generating  mesh  points  to 
be  used  in  the  integrations  over  Kil 

The  first  question  is  answered  by  the  following: 

Assertion.  —  \{  a  generalized  linear  transformation 
Y  =  sX-\-t  in  the  underlying  random  variable  X 
induces  a  (possibly  different)  transformation  0v  = 
s'Qx^t'  in  the  parameter  0  and  in  the  estimators 
0  and  then 

Eihj{Y)=sJEihi{X)  (25) 


where     is  as  given  in  equation  20  and: 

\s'  ;=1,2 
5^==  (5')^      7  =  3,4,5  (26) 
[  1  >  =  6 

Proof. —Is  by  straightforward  change  of  variables 
in  the  multiple  integrals  defining  the  expectations 
(equations  20.  23,  and  24). 

The  parameter  d  and  the  estimators  6  and  6  de- 
fined in  equations  1-3  satisfy  these  conditions,  so 
the  assertion  does  apply  to  them.  Moreover,  equa- 
tions 19  and  22  show  that  the  values  of  a*  and 
F{Ii)  are  the  same  for  all  distributions  of  the  same 
statistical  type;  the  RMSE's  and  biases  also  will 
be  the  same  when  measured  in  units  of  the  under- 
lying population  standard  deviation.  (Two  distribu- 
tions are  said  to  be  of  the  same  statistical  type  when 
their  underlying  random  variables  are  related  by  a 
transformation  X  —  sX+t.)  This  result  means  that 
the  computed  values  of  a*,  RMSE/o".  and  Bias/o"  do 
not  depend  on  the  mean  /u.  and  standard  deviation 
cr  of  the  assumed  underlying  distribution  Fo.  but 
only  on  its  skewness  and  higher  order  properties. 

We  now  turn  to  the  construction  of  the  regions  X,. 
As  a  first  step,  we  show  that  it  is  necessary  to  con- 
struct only  a  portion  of  the  region  Ki. 

Assertion.  —  Let  the  estimators  6  and  6  be  sym- 
metric functions  of  x.  (That  is,  rearrangement  of 
the  components  of  x  does  not  change  the  value  of 
either  estimate.  Note  that  d  and  6  defined  in 
equations  2  and  3  are  in  fact  symmetric.)  Moreover, 
let  G  consist  of  those  sample  points  of in  which 
the  Nth  component  is  the  largest.  That  is, 

G={xeR'  :  XV  &  Xj{j=\  .V)}  (27) 

Then 

Eih  =  N-i      FUh)h(x).  (28) 

JK.nc 

Proo/.  — The  integrand  is  symmetric  in  the  sense 
defined  above.  Similarly.  Ki  is  symmetric  in  the 
sense  that  rearranging  the  components  of  a  point 
in  Ki  produces  another  point  in  K,.  This  assertion 
therefore   is  simply  a  restatement  of  a  familiar 
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result  on  the  integration  of  symmetric  functions 
over  symmetric  regions. 

The  region  KiOG  now  can  be  constructed  by 
choosing  the  first  A^— 1  components  of  x  arbi- 
trarily and  seeking  appropriate  values  for  the 
A^th  component.  Formally,  let  x' =  {xi,  .  .  .,Xn-i) 
be  any  point  in  A'— 1  dimensional  space  R^~^, 
let  x.v  be  any  point  in  /?^  and  let  x—{x',  x\)  = 
{xi,  .  .  .,  Xn-1,  xn);  then  x  is  an  arbitrary  point 
in  R^.  The  formal  mathematical  representation  of 
Ki  n  G  is 

K,nG=    U    x'X^iix')  (29) 

where  X  denotes  Cartesian  product  and 

Ci{x')  =  {x,ve/?'  -.rix' ,  xn)  e//,  x,m  ^  max  x'}.  (30) 

An  explicit  construction  of  the  Cii^' )  is  given  below. 
The  physical  interpretation  of  equation  29  is  that 
the  x'Xi^i{x')  are  segments  of  "vertical"  lines 
passing  through  points  x'  in  R'^~^. 

Figure  1  indicates  how  this  idea  would  be  used 
to  represent  an  annulus  in  the  plane. 

To  construct  the  intervals  (,i{x'),  we  start  with 
an  arbitrary  point  x=  {x' ,  xn)  in  G,  so  that  xn 
=  max  Xi.  We  then  seek  additional  conditions  on 
Xn  to  ensure  that  rix)  lies  in  the  specified  interval 
h.  To  find  these  additional  conditions,  we  express 


rix)  in  terms  of  xn—  max  Xi  and  the  sample  mean 
x'  and  sample  standard  deviation  s'  oi x'  as  follows: 


rix)  =  rix' ,  Xn) 


max  Xi  —  x 


max  Xi  —  Xn 

iN  —l)x'  +  Xn _        Xn  —  x' 


x  =  - 


N 


-=x+- 


N 


,  _,,Xn  —  x 
Xi  —  \  x  +■ 


+ 


N 


_,     xn  —  x 
Xn  —  \  X  +■ 


N 


■V-l 


^  ixi-x')'+iN-\) 


Xn~  X 

N 


+  (  1-^)'Ua--x')2 


=  (A^-l)(5')2+  ixN-X 

Thus,  for  any  x=  ix' ,  xn)  in  G 


rix',  Xn)  = 


^  N 


iXN-x')(  1-^ 


is')~+  ixN-x'Y^^ 


or 


Figure  1.  — Representation  of  an  annulus  as  U xeu'xX[,(x). 


^L_l\fXN-x' 


rix' ,  Xn)  =■ 


It  will  be  noted  that 


\      NJ\  s' 


rix',XN)=VN(l-j^'jVMt)  (32) 


where 


and 


Mt)  = 


XN  —  X 


(31) 


(33) 


(34) 
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The  function  f\it)  has  the  general  shape  sketched 
in  figure  2;  it  increases  monotonically  from  0  to  1 
'with  increasing  t.  Thus,  for  any  sample  size  N,  the 
r-statistic  is  bounded  as  follows: 

0.r(Jf)<VS(l4)  ^3^^ 

this  result  justifies  the  use  of  a  sufficiently  fine 
finite  step  function  a  as  an  arbitrarily  good  approxi- 
mation of  a  general  weight  function.  Because  /v  is 
monotone,   the  «-value  /v'(y)    corresponding  to 


1.0 


t 


y  =  f\U)  can  be  computed.  The  result  is 


/:v'(y)  =  VyVx/^-  (3 


Using  this  result,  the  x.v-intervals  Ci{x')  can 
constructed. 

Let  the  r-interval/j  be  specified  by 


Figure  2. -The  function /v(f)  (/V  =  25). 
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with  rf  less  than  the  upper  bound  V/V^l— j 

and  r-  greater  than  0.  Let  the  point  x'  =  {xi,  .  .  ., 
xn^i)  be  given  in  R'^'K  The  interval  ^i(jc')  contains 
those  points  xn  such  that 


xn  ^  max  X 


(38) 


and 


or 


rf  <  r{x'.  x»)  =  VOV  (l  -j^^  VfyU)  <  rj'  (39) 


(r!.p<A'|l-^)>(0<(r!r. 


Thus,  i  must  lie  between  the  limits 


j=L,  U.  (41) 


Ml 


From  equation  36, 


j=L,U.  (42) 


The  corresponding  limits  on  xn  are  x' -\- zjs'  (j— 
L,  U).  Because  xn  must  also  be  larger  than  any  of 
the   first  A'^— 1   components,  the  lower  limit  of 


^•-(x')  =  max  (max  x' ,  x  -\-  z'{s') 


whereas  the  upper  limit  is 


a'(x')=x'+zf5'. 


(43) 


(44) 


If  should  exceed  ^- (x'),  the  interval  would 

be  empty.  In  this  case,  there  could  be  no  sample 
X  having  its  smallest  A'— 1  components  equal  to 
x'  and  yet  having  r{X)  in  the  interval/j. 

Using  this  representation  of  Ki  fl  G,  the  integrals 
28  may  now  be  represented  as  interated  integrals 


F{dx)h{x)^\  F'idx') 


(dxx)h{x\xx)  (45) 


^here 


F'idx')  =  l[  Foidx,) 


is  the  joint  distribution  of  yV  — 1  independent  ob- 
servations from  the  underlying  distribution  r  0  and 
h{x' ,  x\)  =  h(x)  with  x=  (x'  ,x\). 

These  integrals  may  be  evaluated  as  follows.  For 
any  x'  in  R'^-^.  the  interval  i,i{x')  is  defined  by 
equations  43  and  44.  The  integral  with  respect  to 
Fo  over  this  interval  for  this  jc'  is  an  ordinary  one- 
dimensional  integral  which  can  be  evaluated  either 
analytically  or  numerically.  If  ^i(jf')  is  empty,  the 
integral  is  zero.  Thus  the  inner  integral  is  a  well- 
defined  function  Hi{x')  for  each  x'  in  R'^~^.  The 
A^— 1  dimensional  outer  integral  over  all  of  R^-^ 


F'(dx')Hi{x') 


now  can  be  evaluated  by  straightforward  Monte 
Carlo  sampling  from  F' . 

Summary  of  Computational  Formulas 


Let  the  nonoverlapping  intervals  Ii  =  r^,r^  be 
given,  along  with  the  parent  distribution  Fo  and  the 
sample  size  A^.  The  r-interval  specifications  should 
satisfy  the  inequality 


0<ri<rl<  vW(l-^) 


(46) 


Let  the  parameter  6  be  /x  +  kct,  where  /x  and  cr  are 
the  population  mean  and  standard  deviation  and 
K  is  a  specified  constant.  The  final  results  desired 
are 
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RMSE(^|/,) 
RMSE(^|/,) 
Bias(^|/0 
_Bias(^|/i) 


EVE] 

El 
VE-I/E'i 


VEwFlEfriEWi 

EjlEI 
_  E]IE'i-E^EllEfEl 


(47) 


where 


Ej=^N  f      F'idx'  )  I       Fo(dxs)hHx',  xs)-  (48) 

The  probability  distribution  Fo  andF'  are  the  given 
parent  distribution  and  the  (^V— 1 ) -dimensional 
joint  distribution  of  independent  observations  from 
Fo. 

For  any  point  x'—{xi,  .  .  .,  x\-i)  in  R^~\  the 
interval  C,i{x' )  is 


^^^^,^^u,'U'),^;u')  if^fu')=s^:u' 

[IJ  otherwise 

where, 

C'{{x' )  =  max  {max  x' ,  x'  +z^s'  } 
(,'j'ix'  )=  x'  +  zfs' , 


and 


ij=L,U). 


(49) 


(50) 


(51) 


The  sample  mean  and  standard  deviation  of  x'  are 
-_  1 


Xi 


1  V-l   


N 


The  integrands  h^{x\  x\)  are  given  by 


h  = 


.  1 

h 

d-e 

h- 

~d-~d 

h' 

A' 

Ce-~dY 

h'' 

{6-e)(d-~e) 

A« 

1 

(52) 


where, 


^=  x+  Kcr 
—  1  V 


(53) 


The  results  of  these  computations  define  the 
weight  function  to  be  used  in  the  estimator: 


where, 
and 


d=~e-a{r{X)){e-~e)  (54) 

a{r{X))  =  a*       for  r(^)e/i  (55) 

tv\  —  maxA"  — ^ 

r(A)  =  :   (56) 


and  permit  a  comparison  of  this  estimator's  per- 
formance with  f^"s  performance. 

Computational  Results 

Numerical  computations  of  the  optimal  weight 
function  a*  and  its  associated  statistical  per- 
formance measures  were  undertaken  for  several 
parent  distributions,  sample  sizes,  and  values  of 
K  in  the  parameter  B=  ix-\-  kct.  The  emphasis  was 
on  lognormal  distributions  with  skew  coefficients 
between  2.0  and  6.0;  in  additit)n.  several  cases 
with  exponential,  log  Pearson,  and  Pareto  parent 
distributions  were  computed.  The  values  of  k  con- 
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sidered  were  clustered  around  —0.3.  1.0,  and  3.5 
(representing  the  2-year,  10-year,  and  100-year 
floods  for  the  parent  distributions).  Sample  sizes 
N  were  10,  25,  and  50.  The  cases  considered  are 
identified  in  table  1. 

The  integrals  Ej  given  by  equation  48  were 
evaluated  by  straightforward  Monte  Carlo  sampHng 
of  points  x'  in  /?'*"',  followed  by  numerical  integra- 
tion over  the  intervals  C,i{x').  The  r-intervals  for 
each  case  were  constructed  from  percentage  points 
of  the  empirical  distribution  of  a  preliminary  sample 
of  200  /? -values.  Only  the  upper  half  of  this  distri- 
bution was  considered.  Each  r-interval  held  about 
10  percent  of  the  samples  and  replicate  runs  were 
made  to  obtain  results  at  10  to  15  r-values  for  each 
case.  At  least  1,000  Monte  Carlo  points  x'  were 
generated  in  each  run;  the  resulting  estimated 
Monte  Carlo  sampling  error  (estimated  standard 

Table  1.— Problem  identification 


Code 


LN-H-100. 
LN-H-10... 
LN-H-2.... 


LN-M-100. 
LN-M-IO... 


LN-I-100. 
LN-I-IO... 
LN-I-2.... 


LN-L-100. 
LN-L-10... 
LN-L-2.... 


EX- 100. 
EX-IO... 
EX-2.... 


LP-H-100. 
LP-H-10... 


LP-IM-100. 
LP-IM-2.... 


LP-I-100. 
LP-I-10... 


PA-H-10. 
PA-I-10... 


Parent 


lognormal. 

 do  

 do  


.do. 
.do. 


..do. 
..do. 
.do.. 

..do. 
.do. 
.do. 


exponential. 

 do  

 do  


log  Pearson. 
....do  


log  Pearson. 
 do  


.do. 
.do. 


Pareto. 
 do.. 


Skew 


6.18 
6.18 
6.18 

3.69 
3.69 

2.45 
2.45 
2.45 

1.26 
1.26 
1.26 

2.00 
2.00 
2.00 

6.19 
6.19 

3.33 
3.33 

2.45 
2.45 

6.14 
2.45 


3.97 
.90 
-.30 

3.88 
1.08 

3.66 
1.20 
-.26 

3.18 
1.30 
-.18 

3.60 
1.30 
-.31 

3.97 
.90 

3.50 
-.30 

3.70 
1.20 

.97 
1.24 


deviation  of  the  final  Monte  Carlo  estimates)  was 
generally  of  the  order  of  5  or  10  percent.  Results  of 
replicate  runs  generally  agreed  to  within  the  esti- 
mated Monte  Carlo  sampling  error. 

The  optimal  weight  functions  for  the  10-year  and 
100-year  events  are  plotted  in  figures  3  to  5;  the 
weight  functions  represented  in  these  figures  are 
identified  in  table  1.  In  most  cases,  the  optimal 
weights  were  negative  at  small  r-values;  they  have 
been  truncated  at  zero  in  the  figures  because  the 
RMS  errors  and  biases  of  the  weighted  and  un- 
weighted estimates  were  the  same  whenever  the 
optimal  weight  was  negative.  The  general  trend  of 
the  weight  function  in  these  cases  is  generally 
what  might  be  expected:  the  value  of  a*  (repre- 
senting a  discounting  of  the  largest  observation) 
increases  with  r,  the  relative  size  of  the  largest 
observation.  The  negative  values  of  the  weight 
presumably  indicate  that  when  the  relative  size  of 
the  largest  observation  is  small,  the  ordinary 
estimate  d  tends  to  be  an  underestimate  of  6  and 
should  have  something  added  to  it.  (In  fact,  0  is 
negatively  biased  at  small  r  values  and  positively 
biased  at  large  ones.)  It  seems  fairly  clear  that  the 
largest  observation  should  not  be  discounted  in 
this  case  (and  the  lack  of  improvement  in  RMS 
error  and  bias  supports  this  contention)  so  we  have 

chosen  instead  to  use  the  full-sample  estimate  6^, 
that  is,  to  truncate  the  weights  at  zero. 

The  weight  functions  for  2-year  events  did  not 
conform  to  this  general  pattern.  Instead,  they  had 
generally  negative  slopes  with  occasional  wild 
wiggles  at  high  r-values.  In  most  cases,  these 
weights  were  greater  than  1.  As  would  be  expected, 
the  RMS  errors  of  the  optimally  weighted  esti- 
mate were  smaller  than  those  of  the  ordinary 
estimate  of  the  2-year  events,  but  the  improvement 
was  not  as  striking  as  for  the  10-year  and  100-year 
events  and  did  not  take  the  form  we  had  antici- 
pated. We  have  no  satisfactory  explanation  of  the 
results  for  the  2-year  events,  but  suspect  they 
come  from  an  inappropriate  choice  of  the  weighted 
estimator  in  this  case.  Accordingly,  we  have  not 
attempted  to  incorporate  these  results  in  our  gen- 
eral weight  function. 

To  get  a  compact  representation  of  the  weight 
functions  a*  (for  10-year  and  100-year  events) 
plotted  in  figures  3  to  5,  we  decided  to  use  a  very 
rough  representation  in  terms  of  straight  line  seg- 
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ments;  these  seemed  to  fit  the  data  as  well  as  any 
more  complicated  curve.  F'urther.  the  primary 
variables  affecting  the  shape  and  location  of  the 
computed  weight  functions  appear  to  he  the  sam|)le 
size  and  skew;  the  underlying  population  type  and 
the  parameter  6  being  estimated  seem  to  be  of 
secondary  importance  (and.  in  terms  of  the  straight 
line  representation,  entirely  negligible).  Recalling 
that  the  maximum  possible  value  of  the  r-statistic 
for  a  sample  of  size  is  approximately  ViV,  we 
argued  that  all  the  weight  functions  should  equal 
1  for  r  S=  VjV  and  should  slope  downward  to  the 
left  at  slopes  depending  on  the  (population)  skew 
coefficients  for  the  various  problems.  The  resulting 


Figure  3.  — Weight  functions  a,*  sample  size  =  10. 

weight   function    a  *  is   defined  as 


approximate 
follows: 


a*(r.A^,y)=max  (0.  min  (1.  \  -S(\^-r))) 


(57) 


where  y  is  the  (population)  skew  coefficient  and: 


S  = 


7//V  ifiV  ^95 

7/A'  +  y(0.062  -  0.00065A' )  if  A'  <  95.  (58) 


To  evaluate  this  function  in  practice,  one  would 
have  to  use  a  sample  estimate  of  the  skew.  This 
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function  is  plotted  in  figures  3  to  5  for  y  —  2.Q  and 
7  =  6.0. 

This  approximate  a  function  was  tested  by 
straightforward  simulation  of  the  estimators  6, 
and  6.  This  time  the  sample  estimates  of  the  skew 
coefficient,  not  the  population  skews,  were  used  as 
arguments  of  the  a  function  in  forming  the  esti- 
mates 6.  Typical  results  of  these  tests  are  shown  in 
figures  6  to  14.  All  results  are  expressed  in  units  of 
the  standard  deviation  cr  of  the  underlying  popula- 
tion, so  they  are  dimensionless  and  independent  of 
the  population  mean  and  standard  deviation.  Solid 
plotting  points  represent   results   based  on  the 


computed  weight  function  a*  for  each  case;  hollow 
plotting  points  represent  results  based  on  the 
approximate  weight  function  a*  (evaluated  at  the 
observed  skew  value  for  each  sample).  Short 
vertical  lines  represent  intervals  of  one  estimated 
standard  deviation  of  the  final  Monte  Carlo  esti- 
mates above  and  below  the  central  points.  Small 
numerals  next  to  the  hollow  plotting  points  represent 
the  fraction  of  all  samples  represented  by  that 
point.  The  leftmost  hollow  plotting  points  represent 
all  samples  lying  below  the  (approximately)  55th 
percentile  of  all  observed  r-values. 
The  RMSE  of  the  weighted  {a*)  and  full-sample 
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estimates  and  6  are  summarized  in  table  2.  The 
entries  labelled  "Upper  10  percent"  refer  to  those 
samples  havinji  r  values  above  the  (approximately) 
90th  percentile  of  all  observed  rvalues.  The 
"Overall"  entries  refer  to  the  entire  Monte  Carlo 
sample. 

Statistics  on  the  censored-sample  estimates  8 
also  were  recorded.  For  samples  in  the  top  10 
percent  of  r  values,  the  RMSE  of  6  generally  was 
somewhat  smaller  than  that  of  0  but  considerably 
larger  than  that  of  0.  Occasionally,  the  RMSE  of 
6  was  even  greater  than  that  of  6.  The  bias  of  6 
was  negative  and  generally  of  greater  magnitude 


than  that  of  d;  apparently  6  underestimates  6  more 
than  6  overestimates  it. 

The  noteworthy  feature  of  these  results  is  the 
potentially  great  reduction  in  RMSE  obtained  by 
using  a  weighted  average  of  the  full-sample  and 
censored-sample  estimates  6  and  6.  For  samples 
drawn  from  skewed  populations  and  having  r-values 
in  the  upper  10th  percentile,  the  RMS  errors  of  the 
estimated  10-year  and  100-year  Hoods  are  cut  in 
hall  by  using  the  weighted  estimate  instead  of  the 
full-sample  estimate  0.  The  absolute  magnitude  of 
the  improvement  ranges  up  to  about  2.5  population 
standard   deviations   (100-year   flood,  high  skew. 
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sample  size  10).  When  the  sample  does  not  contain 
a  large  outlier  or  when  the  2-year  flood  is  being 
estimated,  the  weighted  and  full-sample  estimates 
are  of  equivalent  accuracy. 


Table  2.— Summary  of  root  mean  square  errors 


Problem 

Sample 
size 

RMSE  (6l)/o- 

RSME(e)/o- 

Overall 

Upper  10 
percent 

Overall 

Upper  10 
percent 

LN-H-100  ... 

10 

2.74 

4.73 

2.23 

2.19 

ZD 

2.10 

4.11 

1.72 

1.62 

50 

1.68 

3.31 

1.39 

1.30 

LP-IM-100... 

10 

1.92 

3.68 

1.59 

1.62 

25 

1.39 

2.57 

1.15 

1.07 

50 

1.06 

2.00 

.89 

.88 

LP-I-lOO  

50 

.93 

1.65 

.81 

.82 

LN-I-100  . 

10 

1.  lO 

O  AO 

z.oo 

1.3  / 

1  cr  Q 
l.OO 

25 

1.18 

1.55 

1.09 

.98 

50 

.89 

1.22 

.84 

.79 

EX-lOO  

10 

1.68 

2.14 

1.57 

1.62 

25 

1.12 

1.37 

1.08 

1.11 

50 

.81 

.96 

.79 

.76 

LN-L-100 

10 

1.25 

1.92 

1.14 

1.26 

25 

.83 

1.16 

.77 

.75 

50 

.60 

.75 

.57 

.53 

LN-H-10  

10 

.83 

1.37 

.68 

.69 

25 

.61 

1.16 

.51 

.51 

50 

.47 

.91 

.40 

.38 

Table  2.— Summary  of  root  mean  square 
errors  —  Continued 


r  rrtK  1  m 

sample 
size 

RMSE  {0)l(T 

RMSE  (Sj/o- 

Uverall 

Upper  lU 
percent 

Overall 

Upper  lU 
percent 

T  t>_T—  1  n 

Zo 

.51 

.85 



.46 

.45 

LN-I-10   

10 

.76 

1.16 

.69 

.71 

25 

.50 

.67 

.47 

.47 

OU 

.37 

.47 

.35 

.34 

EX-10  

10 

.78 

.93 

.75 

.80 

25 

.51 

.60 

.50 

.54 

50 

.37 

.42 

.36 

.36 

LN-L-10   .,  , 

10 

.67 

.93 

.62 

.65 

25 

.43 

.57 

.41 

.40 

50 

.31 

.38 

.30 

.29 

LN-H-2  

10 

.18 

.17 

LP-IM-2 

25 
25 

10 

25 

10 
25 
50 

10 
25 
50 

.12 

.14 

.24 
.15 

.24 
.15 
.10 

.28 
.17 
.13 

.11 

.13 

.24 
.15 

.24 
.15 
.10 

.28 
.17 
.13 

LN-I-2 

EX-2  

LN-L-2 
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Figure  6. -RMS  error  of  100-year  flood,  LN-H-100. 
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Figure  7. -RMS  error  of  100-year  flood,  LN-I-100. 
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Figure  8. -RMS  error  of  100-year  flood,  EX-100. 
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Figure  11. -Bias  of  100-year  flood,  LN-H-100. 
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Figure  12.-B)a8  of  100-year  flood,  LN-I-100. 
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Figure  14. -Bias  of  lO-year  flood,  LN-H-10. 


A  PRECIPITATION  DATA  SIMULATOR  USING  A  SECOND  ORDER 

AUTOREGRESSIVE  SCHEME 

By  E.  H.  Wiser  i 


Abstract 

It  is  assumed  that  for  fixed  weather  conditions, 
the  negative  binomial  distribution  can  be  used  to 
fit  the  observed  distribution  of  precipitation 
amounts.  Varying  weather  conditions  are  then 
interpreted  as  changing  the  values  of  the  param- 
eter of  the  distribution. 

Weather  conditions  vary  in  a  quasi-oscillatory 
manner  which  may  be  described  by  a  second-order 
autoregressive  scheme  (SOARS).  The  SOARS  is 
used  to  generate  a  "Storminess"  parameter,  which 
varies  through  time  in  the  desired  manner.  Rate 
and  direction  of  weather  movement  are  also 
simulated  to  produce  simultaneous  values  of  the 
parameter  at  as  many  points  as  required.  These 
values  are  converted  by  functions  for  each  point 
into  the  parameters  of  the  negative  binomial  dis- 
tribution from  which  point  rainfall  amounts  are 
obtained  at  each  point. 

Use  of  the  simulator  is  demonstrated  with  data 
for  several  North  Carolina  stations,  and  the  process 
of  fitting  the  parameters  is  described. 
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Introduction 

Precipitation  data  simulation  has  become  ex- 
tensive in  recent  years.  Simulator  output  has  been 
used  to  study  agricultural  water  requirements,^ 
and  its  use  in  combination  with  catchment  simu- 
lators such  as  the  Stanford  Watershed  Model  to 
generate  streamflow  data  has  also  drawn  interest.^ 

Development  of  the  model  reported  here  resulted 
from  the  need  to  study  a  problem  that  could  not  be 
solved  using  earlier  models.  If  a  diversion  were 
planned  to  carry  water  from  one  basin  to  another, 
modeUng  of  the  hydrologic  consequences  would 
require  simultaneous  simulation  of  rainfall  over 
both  basins. 

Existing  models,  such  as  Pattison's  or  the  one 
developed  by  the  writer,^  appeared  to  work  satis- 
factorily for  a  single  location  but  did  not  lend 
themselves  readily  to  simultaneous  generation  at 
several  locations,  unless  independence  between 
locations  could  be  safely  assumed.  As  an  alterna- 
tive, the  writer  reverted  to  a  suggestion  by  W.  E. 
Splinter  (oral  communication),  made  at  the  time 
that  the  earUer  model  was  developed,  that  the 
apparently  cyclic  behavior  of  precipitation  due  to 
airmass  movements  might  be  explicitly  used  in  a 
model. 

Accordingly,  occurrence  of  precipitation  at  a  point 
was  assumed  to  result  from  two  processes.  The  first 
was  a  storm  process  which  simulated  the  relative 
likelihood  of  precipitation  occurrence  (referred  to 
in  this  paper  as  a  "storminess  parameter")  cor- 
responding to  movements  of  frontal  systems  over 
the  point.  The  second  was  a  local  process  which 


2  Wiser,  E.  H.  irrigation  planning  using  climatological 
DATA.  Proc.  Amer.  See.  Civ.  Engin.  91(IR4):1-11.  1965. 

^  Pattison,  A.  SYNTHESIS  OF  HOURLY  RAINFALL  DATA.  Water 

Resources  Res.  l(4):489-498.  1965. 

^  Wiser,  E.  H.  monte  carlo  methods  applied  to  precipi- 
tation FREQUENCY  ANALYSES.  Trans.  Amer.  Soc.  Agr.  Engin. 
9(4):538-542.  1966. 
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included  both  a  deterministic  component  (for  ex- 
ample, orographic  effects)  and  a  randomness 
component  that  determines  the  simulated  precipita- 
tion. Since  the  first  process  would  be  generally 
applicable  over  a  region,  a  value  of  the  storminess 
parameter  that  occurs  at  a  point  would  also  occur 
at  any  other  point,  either  earlier  or  later  in  time. 
This  would  introduce  an  element  of  correlation 
into  the  simulated  results  in  somewhat  the  same 
way  that  it  occurs  in  nature. 

Simulation  in  this  manner  consists  of  a  three-step 
process:  (1)  Computation  of  the  storminess  param- 
eter at  the  point;  (2)  conversion  of  the  parameter 
to  introduce  the  local  deterministic  component; 
and  (3)  determination  of  a  precipitation  amount 
using  random  selection  from  a  distribution  whose 
parameters  are  varied  in  accordance  with  the  first 
two  steps. 

The  resulting  simulated  amounts  will  have  a 
mixed  distribution,  the  form  of  which  will  depend 
on  the  distribution  of  the  storminess  parameter, 
the  form  of  the  conversion  used,  and  the  type  of 
local  distribution.  Since  there  would  be  considerable 
advantages  in  being  able  to  specify  the  form  of  the 
mixed  distribution,  some  effort  was  made  to  use 
known  mixed  forms.  For  example,  it  is  known  that 
if  the  storminess  parameter  were  Poisson  dis- 
tributed, if  the  conversion  were  such  that  the 
result  remained  Poisson  distributed,  and  if  the  local 
random  process  used  a  Gamma  distribution,  then 
the  resulting  amounts  would  be  negative  binomially 
distributed  with  known  coefficients.  It  was  found, 
however,  that  the  constraints  on  the  overall  model 
introduction  in  this  manner  were  too  serious  to 
permit  satisfactory  development.  It  was  therefore 
decided  that  the  individual  steps  should  be  devel- 
oped in  as  logical  a  manner  as  possible,  and  the 
results  studied  by  Monte  Carlo  methods  if  necessary. 

The  remainder  of  this  paper  describes  the  forms 
now  used  for  the  three  steps,  discusses  methods 
of  parameter  fitting,  and  illustrates  use  of  the 
simulator  for  a  North  Carolina  catchment. 


The  Storminoss  Parameter 

The  storminess  parameter  is  intended  to  simulate 
the  relative  probability  of  precipitation,  with  a 
high  value  of  the  parameter  implying  a  strong 
probability.  As  previously  mentioned,  the  parameter 


should  exhibit  a  cyclic  behavior  similar  to  that 
caused  by  movements  of  airmasses. 

Some  consideration  was  given  to  combinations 
of  purely  periodic  functions.  This  would,  however, 
imply  repetition  after  some  point  of  time.  Further- 
more, the  variations  in  amplitude  that  would  be 
required  would  also  be  difficult  to  model. 

As  an  alternative,  the  second  order  autoregres- 
sive  scheme  (SOARS)  was  used.  This  may  be 
written  in  the  form 

qt  =  aiqt-i  +  aiqt-2  +  a"r],  (1) 

where 

qi  is  the  storminess  parameter  at  time  t 
Oi,  02,  and  cr  are  coefficients 
17  is  a  random  mutually  uncorrelated  variable 
with  mean  zero  and  variance  one. 

The  mean  value  of  q  is  zero,  and  the  variance  is 
given  by  ^ 


var  iq)  = 


(1—02)0-- 


(2) 


The  SOARS  is  known  to  be  quasi-periodic  pro- 
vided that 

-  1  ^  a.  <  0 

(3) 

a'^+4a>  <  0. 
In  this  case,  the  correlogram  is  given  by  ^ 


sin  4i 


(4) 


where  .s  is  the  lag.  and  0  and  \}i  are  given  by 


cos^  = 


"1 


  tan  1!/  =  -:;  tan  f^.  (o) 


The  characteristics  of  q  can  be  determined  by 
examination  of  equation  4.  When  o>——  1.  the  cor- 
relogram is  purely  periodic,  and,  conseipientK . 
the  values  of  qi  will  be  periodic  also.  On  the  other 


Co\.  D.  R..  and  Miller.  H.  D.  the  theory  ok  swchastic 
PROCESSES.  J.  Wiley      Sons.  New  York. 

Bartlett,  M.  S.  AN  INTRODI'CTION  TO  STOCHASTIC  PROCESSES. 
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hand,  when  ao  approaches  0,  the  correlogram  is 
strongly  damped,  there  is  much  less  correlation, 
and  periodicity  is  less  apparent.  Also,  the  average 
period  of  the  correlogram  is  \  =  27r/0. 

Typical  realizations  of  the  SOARS  are  given  in 
figures  1  and  2,  illustrating  the  effect  of  varying 
the  parameters.  Instead  of  directly  changing  Qi, 
a-i,  and  cr,  the  parameters  being  used  are  /a=  V—  a-z, 
X  =  27r/0,  and  v  =  var  {q).  The  manner  in  which 
controls  the  shape  of  the  realization  can  be  seen 
by  comparing  figures  lA  and  IB.  The  form  required 
for  the  storminess  parameter  is  evidently  obtained 
for  /X  values  between  0.95  and  0.99.  The  effect  of 
varying  the  wavelength  X  is  illustrated  in  figure  2A, 
which  clearly  shows  the  increasing  frequency  of 
oscillation  with  reduced  wavelength.  The  effects 
of  the  variance  v  is  shown  in  figure  2B.  Only  the 
scale  of  q  is  changed,  being  a  function  of  the  square 
root  off. 

Since  q  is  defined  only  as  a  storminess  parameter, 
it  is  not  possible  to  fit  the  SOARS  parameters  from 
data.  The  only  direct  fitting  possible  is  to  require 
that  the  average  period  X  agree  with  the  relative 
frequency  of  frontal  passages  over  the  point.  Varia- 
tion of  the  other  two  parameters  effectively  controls 
the  amount  of  variance  in  the  simulated  precipita- 
tion amounts,  which  is  ascribed  to  the  storminess 
rather  than  to  local  effects.  Further  discussion  of 
this  is  deferred  to  a  later  section. 

The  above  development  describes  variation  of  the 
storminess  parameter  through  time  at  a  single 
point.  To  introduce  spatial  effects,  the  assumption 
is  made  that  a  parameter  specified  over  a  point  at  a 
given  time  moves  over  another  point  at  the  next 
time  interval.  The  relative  location  of  the  second 
point  is  determined  both  by  the  rate  and  direction 
of  movement  of  the  airmass.  Initial  applications 
of  the  simulator  have  assumed  both  a  constant  rate 
of  movement  and  an  east-west  direction.  However, 
random  variations  in  both  could  be  introduced 
readily  and  would  certainly  be  more  reafistic.  Such 
variations,  like  the  average  storm  period,  would 
be  fitted  to  general  meteorological  conditions  rather 
than  to  observed  precipitation  data. 

The  Local  Process 

The  local  process  includes  both  a  deterministic 
component  and  a  random  component.  Since  the 


characteristics  of  the  former  are  determined  by  the 
latter,  the  random  component  will  be  described 
first. 

It  is  assumed  that,  for  a  given  storminess  value 
at  a  given  location,  a  fixed  amount  of  precipitation 
need  not  result.  Instead,  precipitation  amounts 
would  be  distributed  according  to  a  given  distribu- 
tion. The  amount  at  a  given  time  is  assumed  to 
occur  randomly  from  the  given  distribution,  and  its 
selection  from  the  distribution  is  independent  of 
amounts  during  preceding  time  periods  and  the 
value  of  the  storminess  parameter. 

The  distribution  being  used  is  the  negative 
binomial  distribution,  which  may  be  written  as 

,  .     (k-\-x-\\  p-r 

where  x  is  the  hourly  precipitation  in  0.01  inches, 
and  k  and  p  are  nonnegative  parameters. 
The  mean  value  oi  x  is 

E{x)  =  kp  (7) 

the  variance  is 

var  (x)-A:p(l  +  p),  (8) 

and  the  probability  of  no  precipitation  during  the 
hour  is 

P(0)  =  (l+p)-^-.  (9) 

It  is,  of  course,  not  possible  to  determine  the 
true  form  of  the  distribution  of  x,  since  the  param- 
eters k  and  p  are  presumed  to  be  functions  of  the 
storminess,  which  is  not  physically  defined.  Also, 
the  observed  distribution  of  precipitation  amounts 
is  a  mixture  of  distributions  with  varying  param- 
eters. The  criteria  for  selection  of  the  form  of  the 
distribution  are,  therefore,  the  convenience  of  use, 
and  how  well  the  mixture  fits  the  observed  amounts. 

The  negative  binomial  distribution  has  been  used 
to  fit  short-period  rainfall  amounts.  It  has  the  ad- 
vantage of  including  zero  amounts,  as  compared  to 
a  continuous  form,  such  as  the  Gamma  distribution. 
Finally,  the  skewness  of  the  distribution  permits 
the  kind  of  variation  in  amounts  that  would  reason- 
ably be  expected. 

The  parameters  p  and  k  are  related  to  the  stormi- 
ness parameter  by  the  relations: 
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X  =  192 
K  =0.50  |0 


960 


960 


/I  =0.900 
X«  192 

i'=0.50 


fi  =0.750 
X  •  192 
1/  =0  50 


FlCiURE  1.  —  Realizations  of  a  secoiui  ordfi  autoieiiressivo  >iol\<'ine  (SOARS):       X— 1*)2. 
i;=  0.5;  ^=  O.WW.  O.W.  0,'),S;  W.  K  =  P)2.  i  =  O.S;  m  =  OMO.  0.75.  0..50, 
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Figure  2. -Realizations  of  second  order  autoregressive  scheme  (SOARS):  A,  ^t=0.99, 
!;=0.5;  \  =  384,  192,96;  B, /i.=  0.95,  X=  192;  i;=  0.50,  0.25,  0.10. 
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(10) 


The  form  that  these  relations  should  take  is 
completely  unknown,  and  it  seems  reasonable  to 
use  a  fairly  simple  form.  Since  q  may  take  negative 
values  whereas  p  and  k  cannot,  the  minimum  value 
is  necessary. 

Whenever  either  p  and  k  are  zero,  the  probability 
of  rain  is  zero.  Therefore,  the  values  of  Op  and  (ik 
affect  the  relative  frequency  of  precipitation. 
Large  positive  values  increase  the  opportunity  for 
precipitation,  whereas  negative  values  reduce  it. 
Since  half  the  values  of  q  should  be  negative, 
values  of  Op  and  of  zero  should  mean  that  for 
half  of  the  hours  no  rain  is  possible,  and  the  proba- 
bility of  no  rain  is  certainly  greater  than  50  percent. 

The  parameters  bp  and  bk  control  the  amount  of 
effect  the  storminess  parameter  has  at  the  specific 
location.  Since  large  values  of  q  are  assumed  to  be 
associated  with  large  precipitation  amounts,  both 
bp  and  bk  are  necessarily  positive.  Large  values  of 
bp  in  particular  increase  the  variance  of  the  local 
process,  reducing  the  correlation  through  space 
and  time. 

Estimation  of  Model  Parameters 

Parameters  used  for  fitting  the  model  are  sum- 
marized below: 

SOARS  parameters  (regional): 


fx—  V—  02  shape  parameter 

v^—  V'AR  iq)       scale  parameter 

X.  =  2tt/0  wavelength  (hours  per  cycle) 


RATE 


rate  of  movement  (minutes  of 
longitude  per  hour) 


Transform  parameters: 

Up,  bp  (regional) 
Ok.bk  (local). 

In  addition,  the  location  of  the  station  in  minutes 
from  an  arbitrary  baseline  must  be  given. 


The  SOARS  parameters  are  obtained  essentially 
independently  of  local  data.  The  shape  parameter 
is  determined  from  prior  consideration  of  the 
general  regularity  of  the  cUmate.  Since  p  and  k 
are  obtained  from  q  by  multiplying  it  by  b,,  and  bk, 
and  since  the  scale  parameter  determines  only  the 
scale  of  q,  no  variation  in  the  simulated  results  can 
be  obtained  by  varying  v  that  could  not  also  be 
obtained  by  varying  bp  and  6a.  Therefore,  the  scale 
parameter  is  arbitrarily  fixed.  The  wavelength  and 
rate  are  obtained  by  general  consideration  of  storm 
frequency  and  movement. 

The  transform  parameters  must  be  obtained  by 
Monte  Carlo  methods.  Generally,  this  consists  of 
selecting  values  of  Op  and  bp  and  then  varying  Ok 
and  bk  to  match  two  statistics,  such  as  the  mean 
and  P(0)  at  individual  stations.  By  comparing  ad- 
ditional statistics  such  as  serial  correlations  and 
cross-correlations  between  stations,  values  of  Cp 
and  bp  can  be  obtained;  Op  is  relatively  insensitive, 
but  bp  is  very  sensitive  and  cross  correlations  can 
be  controlled  rather  closely. 

Although  it  would  be  possible  to  vary  Op  and  bp 
from  station  to  station,  it  seems  preferable  to  find 
satisfactory  values  of  these  parameters  for  a  region 
as  a  whole,  and  then  vary  Ok  and  bk  locally. 

To  illustrate  this  procedure,  figure  3  is  provided. 
This  shows  how  Ok  and  bk  can  be  obtained  at  in- 
dividual locations  by  fitting  the  mean  and  P(0), 
for  fixed  /a,  V  AR  (q) ,  Op  and  bp.  The  curves  in  this 
figure  have  been  obtained  by  Monte  Carlo  trials. 
Although  definition  of  the  curves  over  the  range  of 
interest  requires  considerable  computation,  the 
procedure  is  not  difficult.  .\lso.  since  and  V.\R 
iq)  will  be  held  fixed  and  cip  has  Httle  effect,  the 
problem  essentially  reduces  {o  one  of  obtaining 
figures  similar  to  figure  3  for  a  range  of  values  of  bp. 

Fitting  parameter  values  for  simulation  at  ungaged 
points  requires  (1)  prior  fitting  at  gaged  points  in 
the  same  general  area  to  determine  values  of 
dp  and  bp  and  (2)  regionalized  estimates  of  the  mean 
precipitation  and  P(Q)  at  the  required  points. 
The  procedure  for  hoih  gaged  and  ungaged  K>cations 
is  illustrated  in  the  example  in  the  next  section. 

Simulation  of  the  Swannanoa  Ki\er 
Basin  —  (iafjetl  Locations 

To  illustrate  use  of  the  model,  simulation  of 
precipitation  records  is  carried  out  for  stations  in 
the  Swannanoa  River  basin.  Ma\  was  selected  as 
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Figure  3.  — Solution  of  o/t  and  6*  as  functions  of  the  mean  and  P(0). 
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a  month  with  intermediate  amounts  of  precipitation, 
but  simulation  would  be  carried  out  in  the  same 
manner  for  any  other  month. 

A  contour  map  of  the  basin  is  shown  in  figure  4. 
The  drainage  area  is  approximately  96  square  miles, 
with  a  maximum  elevation  difference  between  the 
outlet  and  the  northeast  boundary  of  nearly  4,000 
feet. 

Higher  elevations  are  heavily  forested,  largely 
with  limited  access  because  two  northern  tribu- 
taries, Beetree  Creek  and  North  Fork,  provide  the 
water  supply  for  the  city  of  Asheville.  There  is  a 
broad,  fairly  level  plain  along  the  river  with  farm 
land,  several  towns,  and  a  major  highway. 


There  are  seven  rain  gages  located  in  or  near  the 
basin  which  were  used  in  this  study  (fig.  4).  Four  of 
these  — Beetree  Dam,  North  Fork,  Craggy  Knob, 
and  Mount  Mitchell— are  recording  rain  gages. 
The  other  three  — North  Fork  2,  Black  Mountain, 
and  Swannanoa— are  standard  gages.  Another 
standard  gage  at  Beetree  Dam  was  not  used. 

There  is  another  gage,  known  as  Mount  Mitchell 
2  SSW  by  the  Weather  Bureau  and  Chngmans  Peak 
by  the  Tennessee  Valley  Authority,  which  is  con- 
siderably closer  to  the  watershed  than  Mount 
Mitchell.  However,  the  record  is  broken  and  does 
not  extend  for  the  entire  1951-70  period  of  analysis. 
Information  from  this  gage,  as  well  as  older  dis- 
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continued  gages  at  Montreal,  Swannanoa  Gap. 
Balsam  Gap,  and  Beetree  Gap  was  used  in  estimat- 
ing the  pattern  of  average  precipitation  in  the  basin. 

After  the  gages  were  identified,  it  was  necessary 
to  obtain  certain  precipitation  statistics.  These  are 
summarized  in  tables  1.  2.  and  3.  Strong  orographic 
effects  are  apparent,  with  the  mountain  stations. 
Craggy  Knob  and  Mount  Mitchell,  having  the  highest 
precipitation  amounts. 

The  probability  of  dry  day  (f  (0))  values  exhibit 
the  same  trend.  Since  these  values  are  used  in 


fitting  the  model  parameters,  an  additional  com- 
ment is  in  order.  Experience  has  shown  that  co- 
operative observers  often  do  not  report  precipita- 
tion amounts  on  every  day  of  occurrence.  The  result 
is  that,  while  totals  are  not  affected,  the  P(0)  is 
larger  than  it  should  be.  Since  the  two  stations  hav- 
ing the  largest  values  in  the  basin.  Black  Mountain 
and  Swannanoa,  are  both  cooperative,  some  ad- 
ditional analysis  of  the  surrounding  region  was  made 
to  determine  whether  values  this  high  could  be 
expected  of  valley  stations.  Results  were  affirma- 


FlGl'RE  4.  -  Contour  map  of  llu-  Swannanoa  KiM-r  Basin. 
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Table  1.  —  Monthly  statistics  for  Swannanoa  River  stations.  May  1951-70 

[Simulator  results  in  parentheses] 


Station 
name 


Standard 
deviation 


Serial 
correlation 


Beetree  Dam  

Black  Mountain. 

Craggy  Knob  

Mount  Mitchell.. 

North  Fork  

North  Fork  2  

Swannanoa  


Averages: 
Basin.... 
Basin  I.. 
Basin  II. 


3.14 
(3.37) 

3.39 
(3.19) 

3.98 
(3.89) 

5.24 
(4.41) 

3.96 
(4.10) 

3.63 
(3.51) 

3.12 
(3.36) 


3.52 
(3.56) 
(3.42) 


1.56 
(2.32) 

1.72 
(2.13) 

1.77 
(2.84) 

2.47 
(2.65) 

1.81 
(3.45) 

1.44 
(2.60) 

1.34 
(2.64) 


1.49 
(2.46) 
(2.35) 


Table  2— Daily  statistics  for  Swannanoa  River  stations.  May  1951-70 
Simulator  results  in  parentheses! 


Station 
ID 


Station 
name 


Me 


Standard 
deviation 


Probability 
of  a  dry  day 


Serial 
correlation 


0650. 

0843. 
2120. 
5921. 
6231. 
6236. 
8442.. 


Beetree  Dam  

Black  Mountain. 

Craggy  Knob  

Mount  Mitchell.. 

North  Fork  

North  Fork  2  

Swannanoa  


Averages: 

Basin  

Basin  I.. 
Basin  II. 


0.1014 
(.1086) 

.1094 
(.1028) 

.1289 
(.1255) 

.1690 
(.1421) 

.1276 
(.1322) 

.1170 
(.1131) 

.1006 
(.1085) 


.1110 
(.1149) 
(.1102) 


0.255 
(.346) 

.270 
(.310) 

.300 
(.380) 

.402 
(.363) 

.332 
(.455) 

.275 
(.363) 

.252 
(.380) 


.245 
(.331) 
(.309) 


0.671 
(.699) 

.683 
(.697) 

.618 
(.650) 

.576 
(.575) 

.645 
(.661) 

.632 
(.638) 

.695 
(.714) 


.534 
(.602) 
(.598) 


0.15 
(.46) 

.22 
(.52) 

.14 
(.51) 

.25 
(.58) 

.10 
(.51) 

.16 
(.40) 

.17 
(.49) 


.24 
(.61) 
(.63) 
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Table  3.  —  Cross-correlations  of  daily  amounts  for  Swannanoa  River  stations. 

May  1951-70 

[Simulator  results  in  parentheses] 


Station  ID 

0650 

0843 

2120 

5921 

6231 

6236 

8442 

0650  

1 

0.65 

0.83 

0.72 

0.69 

0.84 

0.80 

(.76) 

(.80) 

(.72) 

(.76) 

(.65) 

(.77) 

0843  

1 

.62 

.67 

.57 

.63 

.65 

(.77) 

(.74) 

(.75) 

(.65) 

(.77) 

2120  

1 

.73 

.74 

.85 

.73 

^.  If) 

(.06) 

^  7Q\ 

5921  

1 

.64 

.72 

.73 

(.73) 

(.73) 

(.75) 

6231  

1 

.81 

.64 

(.73) 

(.84) 

6236  

1 

.76 

(.76) 

8442  

1 

live,  so  the  values  in  table  2  were  used.  If  bias  had 
been  apparent,  modification  of  the  figures  down- 
ward would  have  been  necessary. 

Correlation  figures  show  no  significant  serial  cor- 
relation between  months,  a  small  positive  correlation 
between  days,  and  strong  positive  cross-correlations 
between  stations.  Close  examination  of  table  3  in- 
dicates that  differences  between  values  in  the  table 
are  probably  not  significant.  For  example.  Swan- 
nanoa is  more  highly  correlated  with  Mount  Mitchell 
than  with  Black  Mountain,  which  may  be  in  part 
due  to  a  different  observation  time  at  Black  Moun- 
tain than  elsewhere. 

Values  of  certain  of  the  model  parameters  were 
set  from  prior  information.  They  are: 


For  this  combination  of  parameters,  values  of  ok 
and  bk  can  be  obtained  from  figure  3.  Values  of 
these  parameters  used  in  the  simulation  are  given 
in  table  4. 

Simulation  was  then  carried  out  for  individual 
stations.  Statistics  of  the  results  have  been  Usted 
in  tables  1.  2.  and  3  for  comparison  with  observa- 
tions. It  may  be  commented  that,  although  the  mean 
daily  (and  monthly)  and  f(0)  are  being  fitted,  the 
Monte  Carlo  approach  precludes  obtaining  the 
desired  values  without  a  great  deal  of  iterative 
modification.  Since  two  periods  in  nature  are  un- 
likely to  lead  to  the  same  set  of  statistics,  this  can- 
not be  considered  a  drawback. 

Cross-correlations  between  stations  have  been 
matched  quite  well,  and  no  improvement  appears 


)Li  =  0.99 
VAR(q)=l 

A  =192 
RATE  =10. 

Given  these  values,  initial  Monte  Carlo  trials  showed 
that  cross-correlations  of  the  required  magnitude 
would  be  obtained  with  values  of 

a„  =  0 

bn=\6. 


Table  'i.— Simulation  parameters  for  Swannanoa 
River  stations.  May 


Station 

ajt 

bK 

Weiiiht 

Beetree  Dam  

-0.022 

0.072 

0.256 

Black  Mountain  

-.028 

.077 

.158 

Crafigv  Knoh  

-.012 

.072 

.120 

Mount  Mitchell  

0 

.080 

.024 

North  Fork  

-.022 

.082 

.168 

North  Fork  2  

-  .012 

.072 

.118 

-.032 

.078 

.156 

130 


MISCELLANEOUS  PUBLICATION  NO.  1275,  U.S.  DEPARTMENT  OF  AGRICULTURE 


necessary.  On  the  other  hand,  serial  correlation 
between  days  is  excessive,  and  this  has  caused 
excessive  standard  deviations  in  both  daily  and 
monthly  values.  The  negative  serial  correlation 
between  months  is  not  explainable,  but  since  it 
would  be  removed  anyhow  when  an  actual  sequence 
of  months  would  be  used,  this  is  not  serious. 

Frequency  distributions  of  the  daily  and  monthly 
values  were  also  obtained.  Results  for  Beetree 
Dam  are  shown  in  figures  5  and  6.  Similar  results 
were  obtained  for  all  stations. 

As  an  additional  check  on  the  simulator,  basin 
average  daily  values  were  calculated  both  from  sta- 
tion observed  values  and  station  simulated  values. 
Averaging  was  by  Thiessen  polygons,  modified 
slightly  by  topographic  considerations.  The  weights 
are  given  in  table  4. 

Results  are  summarized  in  tables  1  and  2  as  "basin 
average  I"  and  in  figures  7  and  8  as  frequency  dis- 
tributions. Agreements  and  differences  are  similar 
to  those  for  individual  stations.  Apparently,  cross- 


correlations  between  stations  in  the  simulated  rec- 
ords are  sufficiently  similar  to  those  in  the  observed 
records,  so  that  the  simulated  records  can  be  used 
as  simultaneous  values. 

It  was  also  decided  to  examine  the  effect  on  water- 
shed average  of  various  degrees  of  cross-correla- 
tion. To  do  this,  a  separate  model  was  developed.  A 
table  was  stored  for  each  station  containing  amounts 
distributed  according  to  the  observed  distribution. 
For  the  case  of  no  correlation,  amounts  were 
selected  randomly  from  each  table,  and  the  as- 
sociated average  was  computed.  For  the  case  of 
strong  correlation,  a  single  random  number  was 
obtained,  associated  amounts  from  each  table  were 
selected,  and  the  average  computed. 

Results  of  both  these  procedures  are  also  shown 
on  figures  7  and  8.  The  distribution  of  daily  values 
for  no  correlation  is  significantly  different  from  the 
observed  values.  The  difference  in  form  illustrates 
the  effect  of  cross-correlation  on  the  distribution  of 
the  averages.  The  results  also  indicate  that  the 
cross-correlation  in  the  simulator  may  be  too  high. 


BEETREE  DAM 


MAXIMUM  VALUE 
UPPER  DECILE 
UPPER  QUARTILE 
MEDIAN 

LOWER  QUARTILE 
LOWER  DECILE 


BUNCOMBE  STATION  NO.  31-0650 
FREQUENCY  ANALYSIS  OF  DAILY  RAINFALL  FOR  THE  MONTH  OF  MAY 
01/1951  TO  07/1970 

2.12  S(X)             =  6.2850E+01 

0.39                         NUMBER  OF  DAYS  =          620  S(X*X)           =■  4.67UE+01 

0.03                         MEAN                      =       0.101  S(X*X*X)       =  5.0833E+01 

0.00                         STD.  DEVIATION  =      0.255  S(X*X*X'''X)   =  6.9143E+01 

0.00                         SKEWNESS              =       3.685  3RD  MOMENT  -  6.1158E-02 

0.00                       KURTOSIS             =     16.510  4TH  MOMENT  =  8.2605E-02 
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Figure  5.  — Cumulative  frequency  distribution  of  daily  precipitation,  Beetree  Dam. 
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BEETREE 

BUNCOMBE 

STATION  NO. 

31-0650 

FREQUENCY  ANALYSIS 

OF  MONTHLY  RAINFALL 
01/1951  TO  07/1970 

FOR  THE 

MONTH  OF  MAY 

MAXIUM  VALUE  LESS 

THAN  10.00 

S(X) 

6.2850E+O1 

UPPER  DECILE 

5.35 

NUMBER  OF  OBS.  = 

20 

S(X*X) 

2.4649E+02 

UPPER  qUARTILE  = 

3.A5 

MEAN 

3.142 

S(X*X*X) 

1.1662E+03 

MEDIAN 

2.85 

STD.  DEVIATION  = 

1.565 

S(X*X*X*X)  = 

6.3876E+03 

LWER  QUARTILE  = 

1.95 

SKEWNESS 

1.093 

3RD  MOMENT  = 

4.1905E+00 

LCHER  DEC  Hi: 

1.25 

KURTOS IS 

1.014 

4TH  MOMENT  = 

2.4075E+01 

1.0 


0.8 


0.6 


0.4 


0  2 
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Figure  6. —  Cumulative  frequency  distribution  of  monthly  precipitation,  Beetree  Dam. 


Simulation  of  the  Swannanoa  River 
Basin— Ungajsed  Locations 

A  primary  purpose  of  the  simulator  is  to  generate 
records  at  ungated  locations.  A  well-p:at:ed  water- 
shed had  been  selected  to  provide  adequate  test 
results.  The  same  watershed  could  be  used,  how- 
ever, by  selecting  arbitrary  gage  locations  and  gen- 
erating records  at  these  locations.  Since  results 
could  not  be  compared  with  data  directly,  the  water- 
shed average  values  would  have  to  serve  this 
purpose. 

Accordingly,  eight  locations  were  selected  to 
represent  samples  of  the  various  topographic 
features  of  the  basin.  Where  simulation  is  being 
done  for  a  specific  purpose,  other  selection  criteria 
would  be  used.  The  locations  were  arbitrarily  named 
stations  \  through  H. 

Simulation  for  these  stations  would  use  the  same 
values  of  the  regional  parameters  as  fitted  for  the 
seven  real  stations.  Only  <u  and  hi,  would  have  to 


be  obtained  for  the  arbitrary  locations.  These 
parameters,  in  turn,  were  to  be  fitted  to  (unknown) 
values  of  mean  precipitation  and  the  probability  of 
adrydayf»(0). 

To  determine  these  values,  maps  of  the  basin  and 
adjacent  areas  were  drawn,  using  known  values  at 
gage  locations,  estimates  at  discontinued  gage 
locations,  and  topographic  features  that  are  known 
to  have  an  effect.  These  maps  are  shown  as  figures 
9  and  10.  with  the  arbitrary  locations  identified.  It 
is  then  a  simple  matter  to  obtain  values  of  the  mean 
and  P{0)  at  each  location  and  to  use  figure  3  to 
determine  the  required  parameters.  These  values, 
together  with  the  weight  associated  with  eai  h  loca- 
tion, are  given  in  table  5. 

Results  are  summarized  in  tables  1  and  ll  as 
"basin  average  11"  and  plotted  in  figi'jes  7  and  8. 
The  frecpiency  distribution  of  dailv  values  is  et- 
fectively  identical  to  that  for  the  seven  stations  and 
could  not  be  plotted  separately.  The  satisfactory 
nature  of  the  results  leads  to  the  conclusion  that 
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SWANNANOA  RIVER  WATERSHED  AVERAGE 


STATION  NO.  99-9999 


MAXIMUM  VALUE 
UPPER  DECILE 
UPPER  QUARTILE 
MEDIAN 

LOWER  QUARTILE 
LOWER  DECILE 


FREQUENCY  ANALYSIS  OF  DAILY  RAINFALL  FOR  THE  MONTH  OF  MAY 


01/1951  TO  07/1970 


2.03 
-  0.35 
0.07 
0.00 
0.00 
0.00 


NUMBER  OF  DAYS  = 
hEAN 

STD.  DEVIATION  = 

SKEWNESS 

KURTOSIS 
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15.720 


S(X) 
S (X*X) 

s (x*x*x^ 
s  i:xi-x*x*x) 

3RD  MOMENT 
4TH  MOMENT 
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Figure  7. —  Cumulative  frequency  distribution  of  daily  average  basin  precipitation,  Swannanoa  River. 


the  generated  records  at  the  arbitrary  locations  also 
exhibit  the  required  correlations  and  can  be  used 
as  simultaneous  records. 

A  Planned  Modification 

The  results  for  the  Swannanoa  River  basin  gener- 
ally exhibit  an  undesirably  high  level  of  serial 
correlation.  This  can  be  reduced  by  increasing  the 
value  of  6;,.  However,  since  this  effectively  in- 
creases the  local  variance,  it  would  also  reduce  the 
cross-correlations  between  stations. 

In  previous  applications,  this  adjustment  was 
made  by  reducing  /u.  By  reducing  the  regularity  of 
the  SOARS,  this  reduces  the  serial  correlation 
without  affecting  the  cross-correlation.  Unfor- 
tunately, as  figure  1  shows,  such  a  reduction  causes 
a  form  of  the  SOARS  which  is  hardly  related  to 
the  concept  of  the  storminess  parameter. 

An  alternative  exists  which  may  also  solve  another 
problem.  The  model  as  it  exists  generated  hourly 
amounts  of  precipitation.  To  this  point,  only  the 


daily  totals  have  been  fitted  and  tested.  The  hourly 
amounts,  however,  would  be  desirable  for  many 
purposes.  Successful  hourly  generation  will  require 
introduction  of  a  diurnal  effect. 

It  is  therefore  expected  that  an  additional  process 
will  be  added  to  the  model.  This  process,  with  both 
deterministic  periodic  components  and  random 
components,  wiU  vary  in  time  like  the  SOARS, 
but  will  be  fixed  by  location  rather  than  moving 
with  the  SOARS.  The  random  component  will 
reduce  the  serial  correlation,  while  the  periodic 
component  wiD  cause  hourly  amounts  to  be  gener- 
ated with  appropriate  diurnal  fluctuations. 

Conclusions  ' 

This  paper  has  described  a  model  to  simulate  ' 
precipitation  records,  has  discussed  parameter! 
fitting,  and  has  demonstrated  by  an  example  ; 
both  how  to  use  the  model  and  how  well  the  model! 
works.  These  results,  as  well  as  results  obtainedi 
for  other  locations,  indicate  that  the  model  will  ! 
provide  useful  results  if  used  carefully.  j 
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Table  5.— Simulation  parameters  for  arbitrary  locations.  May 


Location 

Monthly 

Daily 

F{0) 

as 

Weight 

mean 

mean 

A 

3.55 

0.114 

.bit 

—  0.020 

0.075 

0. 146 

D 

D  

4.00 

.  129 

.625 

—  .014 

.075 

.  10/ 

C  

4.45 

,144 

.62 

-.016 

.082 

.083 

D  

4.05 

.131 

.645 

-.023 

.082 

.092 

E  

3.40 

.109 

.67 

-.023 

.072 

.075 

F  

3.65 

.118 

.63 

-.012 

.072 

.139 

G  

3.35 

.108 

.69 

-.032 

.080 

.134 

H  

3.10 

.100 

.69 

-.028 

.072 

.224 
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FREQUENCY  ANALYSIS  OF  MONTHLY  RAINFALL  FOR  THE  MONTH  OF  MAY 


MAXIMUM  VALUE 
UPPER  DECILE 
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LESS  THAN 
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Figure  8.  — Cumulative  frequency  distribution  of  monthly  average  basin  precipitation,  Swannanoa  River. 
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Figure  9. —  Contours  of  mean  monthly  precipitation,  Swannanoa  River  Basin. 
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Figure  10.  — Contours  of  P(0),  Swannanoa  River  Basin. 


CROSS  SPECTRA  OF  SHORT  DURATION  RAINFALL 


By  D.  M.  Hersh field  and  B.  Levy  ' 


Abstract 

To  examine  the  relative  direction  of  the  incre- 
mental movements  of  rainfall  during  major  storms, 
cross  spectral  analysis  was  performed  on  time  series 
composed  of  15-minute  increments  from  one  major 
storm  for  three  nearby  stations  at  each  of  three  dif- 
ferent watersheds.  A  similar  analysis  was  performed 
with  data  from  a  fourth  watershed  using  a  2-year 
series  of  daily  data.  The  daily  data  appear  to  occur 
randomly  with  time  whereas  the  15-minute  spectra 
all  show  considerable  persistence  as  measured  by 
the  large  amount  of  variance  that  is  explained  by 
the  lower  frequencies.  When  rainfall  is  associated 
with  thunderstorm  activity,  as  in  the  Cherokee, 
Oklahoma  example,  there  are  only  small  differences 
in  rainfall  characteristics  at  the  three  stations  which 
are  separated  by  less  than  '/-»  mile.  Similar  results 
are  obtained  for  non-thunderstorm  rainfall  when  the 
gage  separation  is  about  two  miles.  Considerably 
more  experimentation  has  to  be  performed  with 
storms  of  long  duration  and  increments  of  various 
lengths  in  order  to  determine  the  relative  movement 
of  rainfall  with  time. 

Introduction 

In  small-watershed  hydrology,  interest  is  generally 
in  the  local-scale  characteristics  of  major  runoff- 
producing  storms.  Analysis  of  storm  totals  from 
dense  rain  gage  networks  in  small  watersheds  has 
resulted  in  several  useful  relationships;  for  example, 
the  storm  total  at  the  gage  in  the  geographical  center 
of  the  watershed  is  generally  a  good  estimator  of 
the  average  depth  of  rainfall  over  the  entire  water- 
shed where  this  depth  is  determined  from  the  mean 
of  the  storm  totals  from  all  the  other  gages  in  the 
watershed.'-  Observed  exceptions  to  this  relation- 
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ship  are  the  large  rainfalls  that  occur  in  the  South- 
west and  are  associated  with  isolated  mesoscale 
systems.  However,  one  important  rainfall  character- 
istic that  has  received  very  little  attention  is  infor- 
mation on  the  direction  of  the  incremental 
movements  of  rainfall  during  major  storms. 

Measurements  of  rainfall  in  time,  for  intervals 
of  short  duration,  generally  reveal  rapid  and  some- 
times apparently  random  changes.  This  study  con- 
centrates on  these  changes  at  a  fixed  point  and 
areawise  as  sampled  simultaneously  at  three  nearby 
stations  with  the  objective  of  exploring  the  possi- 
bility of  determining  the  relative  movement  of 
short-duration  increments  of  storm  rainfall  from 
cross-spectra  statistics.  The  end  objective  of  these 
empirical  spectral  analyses  is  to  add  another  dimen- 
sion to  hydrologic  analysis.  This,  in  turn,  might 
suggest  new  watershed  models  or  improve- 
ments on  existing  models  for  more  efficient  predic- 
tion in  hydrology.  This  study  examines  the  salient 
time  properties  of  the  incoming  rainfall  and  presents 
them  in  an  easily  interpretable  fashion  for  compara- 
tive purposes. 

Analysis 

Fifteen-minute  increments  from  one  major  storm 
for  three  nearby  stations  at  the  three  watersheds, 
Cherokee,  Coshocton,  and  Sleepers  River,  were 
used  to  develop  the  required  time  series  t\>r  the 
spectrum  analysis.  At  the  fourth  watershed.  Crab 
Creek,  the  time  series  consisted  of  a  2-year  series 
of  daily  rainfalls  from  three  nearby  stations.  Spec- 
trum and  cross-spectrum  analyses  of  the  resulting 
lime  series  were  performed  using  the  BMD02r  ol 
the  l'CL.\  Biometrical  Statistical  Pri>grani  Package. 
The  lag  period  varied  with  tite  mimber  of  data 
points  for  each  series.  The  resulting  spectral  in- 
formation was  available  at  trcqiiciicv  intervals  ot 
0.001  cycle  minute  '  with  the  high  frequency  limit 
of  the  spectra  being  0.033  cycle  minute  '  for  the 
storm  rainfall  or  2  cycles  per  hour.  For  the  dailv 
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data  series,  spectral  information  was  available  at 
frequency  intervals  of  0.01  cycle  day"',  about  4 
cycles  per  year,  with  a  high  frequency  limit  of  0.50 
cycle  day"',  15  cycles  per  month. 

The  cross-spectrum  function  may  be  written  in 
the  form 

^i2iw)=Aniw)ei'^'^"'\  (1) 

A 12  is  the  amplitude  of  the  covariance  that  can  be 
ascribed  to  frequency  w.  is  the  difference  in 
the  phase  angles  of  the  two  signals  at  this  frequency. 
Thus,  in  this  form  the  cross  spectrum  yields  a 
measure  of  the  importance  of  rainfall  occurring  at 
w  cycles  per  minute  and  the  direction  of  the  storm; 
that  is,  if  ^12  is  positive,  the  storm  is  moving  toward 
rain  gage  2,  whereas  a  negative  value  indicates 
movement  toward  rain  gage  1. 

The  amplitude  spectrum  is  measured  in  the  same 
units  as  the  original  signals.  This  makes  comparison 
among  different  spectra  somewhat  difficult.  How- 
ever, this  can  be  avoided  by  the  use  of  the  "squared 
coherency."  This  transforms  the  amplitude  spectrum 
to  a  standardized  unitless  measurement, 

AlAw) 

^niw)^:2{w)  (2) 

't>ii{w)  and  ^22 (w)  are  the  spectra  of  the  individual 
signals.  This  formula  invites  comparison  with  the 
usual  correlation  coefficient.  The  comparison  is 
proper  and  the  interpretation  is  apt.  Thus,  R^{w) 
indicates  the  degree  of  agreement  in  pattern  of 
the  two  signals  of  frequency  w  and  "^vziw)  in- 
dicates the  separation  of  patterns  in  time. 

The  usual  manner  for  obtaining  a  cross  spectrum 
of  two  signals, and  f-iit) ,  is  first  to  obtain  the 
so-called  cross  correlation; 

01,2(t)  ^   r    Mt)Mt  +  T)dt  (3) 

J  -X 

This  is  then  Fourier  transformed  to  get  the  cross 
spectrum 

1 

Ol,2(u;)=^    I         01,2(T)e-'"'^<fT  (4) 

This  can  be  shown  to  give 


^i,2iw)  =  27rFi{w)F2{w)  (5) 

where  Fi{w)  is  a  Fourier  transform  of  fi{t)  and 
Fi{w)  is  its  complex  conjugate.  Equation  5  shows 
the  physical  interpretation  of  the  cross  spectrum 
most  clearly.  If  either  Fi{w)  or  Fziw)  is  zero  at  the 
angular  frequency  w,  then  ^\,2{w)  will  be  zero. 
^i.ti^)  will  be  nonzero  only  if  both  Fi{w)  and 
F2{w)  are  nonzero  at  w. 

It  is  also  important  to  note  from  equation  5  that  if 
two  signals  have  power  at  the  same  frequency,  they 
need  not  have  a  nonzero  cross  spectrum,  as  was 
mentioned  above.  This  can  be  most  easily  seen  from 
equation  3.  To  give  nonzero  results,  the  signals 
must  be  coherent  at  this  frequency  as  well. 

There  are  many  ways  to  summarize  the  informa- 
tion contained  in  recorded  signals.  The  raw  data 
itself  contains  all  of  the  desired  information.  This, 
however,  may  not  emphasize  the  phenomena  of 
interest.  In  the  case  under  discussion,  we  are  in- 
terested in  describing  the  pattern  of  intensity  and 
movement  of  rainfall  in  the  watershed.  Is  the  storm 
of  uniform  intensity  and  movement  in  the  water- 
shed? How  many  peaks  in  the  storm?  Does  the  rain- 
fall cover  the  entire  watershed  at  once  or  does  it 
move  slowly  through  the  watershed? 

Insight  into  the  answers  to  these  questions  is 
available  in  the  variation  in  the  signals  singly  and 
jointly.  The  analysis  of  this  variation  may  be  formal- 
ized by  using  the  autocovariance  function  and 
cross-covariance  function  or  preferably  using  the 
power  spectrum  and  cross-spectrum  functions. 
The  former  display  the  portion  of  the  total  variance 
(covariance)  that  may  be  ascribed  to  rainfall  pat- 
terns repeated  over  fixed  time  intervals  (lags).  The 
latter  present  this  information  as  a  function  of  fre- 
quency, say,  cycles  per  minute.  In  particular,  the 
cross  spectrum  will  ascribe  a  significant  contribu- 
tion of  a  given  frequency  if,  and  only  if,  the  con- 
tribution is  large  in  each  of  the  constituent  series 
and  if  the  two  series  are  phase  coherent.  The  mean- 
ing and  usefulness  of  this  should  become  clear  as 
the  examples  are  presented. 

The  results  of  the  analyses  by  watershed  are 
presented  below. 

Crab  Creek,  Fa.— The  locations  of  the  three 
stations  used  in  the  analysis  are  shown  on  the 
map  of  figure  1.  The  gages  are  separated  by  about 
3,000  feet.  Approximately  2  years  of  daily  data 


PROCEEDINGS  OF  THE  SYMPOSIUM  ON  STATISTICAL  HYDROI.Of;Y 


137 


CRAB  CREEK  WATERSHED  W-l,  MONTGOMERY  CO.,  VA. 


N 

4 


r 


\ 
I 
\ 
; 


 I 


1000  2000 


•  RAIN  GAGE 


SCALE  IN  FEET 


Figure  1.  — Station  location  map,  Crab  Creek  Watershed  W-l, 
Montgomery  Co.,  Va. 


(calendar-day)  were  used  in  the  analysis;  that  is. 
the  length  of  record,  A'  =  725,  and  the  number  of 
lags,  m  =  75.  The  series  may  be  considered  weakly 
stationary  with  no  distinct  trend  for  seasonality. 

Examination  of  the  coherence  squared  spectrum 
in  figures  2.  3,  and  4  indicates  that  the  same  fre- 
quencies of  rainfall  are  effecting  all  three  rain 
gages.  This  is  also  indi(;ated  directly  in  the  graphs 
of  the  individual  spectra.  The  phase  spectrum 
shows  little  pattern  as  one  might  expect  because 
the  individual  spectra  behave  randomly. 

Cherokee,  Okla— The  location  of  the  three  gages 
which  are  separated  by  about  3.000  feet  are  ex- 
hibited on  the  map  of  figure  5.  The  time  series  from 
the  storm  of  September  14.  1957  was  developed 
from  148  15-minute  amounts.  Thirty  lags  were 
used  in  the  analysis.  Graphical  illustrations  of  the 
rainfall  time  series  are  shown  in  figure  6. 

The  power  spectra  of  figure  7  indicates  that  the 
nature  of  the  incoming  rainfall  is  very  similar  at  the 
three  gages.  In  addition,  the  degree  of  association 
between  the  time  series,  as  measured  by  the  coher- 
ence spectra,  shows  approximately  the  same  co- 
herence for  all  combinations  of  two  gages  at  the 


lower  frequencies.  The  three  spectra  appear  to  be 
in  phase  at  most  frequencies.  The  slight  pattern 
that  emerges  suggests  that  the  storm  passes  through 
stations  5,  8,  and  7.  in  that  order.  As  is  usual  in 
most  storm  rainfall,  most  of  the  storm  is  occurring 
in  the  lower  frequency  range,  that  is,  less  than 
0.001  cycle  minute"'. 

Coshocton,  Ohio.  — The  map  of  figure  8  shows  the 
locations  of  the  three  stations,  which  are  separated 
by  approximately  1.500  feet.  The  time  series  of 
15-minute  rainfalls  for  the  storm  of  April  19.  1940, 
are  portrayed  in  figure  9.  Thirty  lags  were  used  in 
the  analysis  of  the  143  data  points. 

The  power  spectra  of  figure  10  are  identical  for 
all  practical  purposes  with  most  of  the  variance 
accounted  for  at  the  lowest  frequencies.  This  is  an 
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Figure  2. —  Various  spectral  cstimales  lor  rain  )ia)irs  2  and  3. 
C.Tub  Creek  Watershed  1. 
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Figure  3. -Various  spectral  estimates  tor  rain  gages  2  and  4, 
Crab  Creek  Watershed  W-L 


indication  of  persistence  of  the  smaller  15-minute 
amounts.  The  coherence  spectra  of  figure  11  are  not 
significantly  different  for  the  smaller  frequencies. 
The  erratic  nature  of  the  coherence  spectra  at  the 
larger  frequencies  is  undoubtedly  due  to  the  rela- 
tively small  size  of  the  sample.  The  lower  illustra- 
tion of  figure  12  indicates  that  the  three  spectra 
are  in  phase  at  the  lower  frequencies.  However, 
the  higher  frequency  accumulation  moves  from  gage 
102  to  103  to  101.  Again,  these  results  may  not  be 
too  informative  because  of  the  relatively  short  length 
of  record  used  to  estimate  the  parameters. 

Sleepers  River,  Vt.—  The  three  gages  forming  the 
triangle  on  the  map  of  figure  13  are  separated  by 


approximately  1.5  miles.  A  total  of  206  data  points 
and  50  lags  were  used  in  the  analysis  of  the  storm 
of  October  5,  1962.  Figure  14  depicts  the  time  series 
for  each  of  the  three  gages. 

The  power  spectra  of  figure  15  appear  to  be  identi- 
cal, but  the  coherence  spectra  of  figure  16  vary 
considerably  for  the  different  combinations  of  rain 
gages  even  at  the  lower  frequencies.  This  is  probably 
due  to  the  relatively  small  range  in  rainfall  magni- 
tude. The  cross  spectra  of  figure  17  are  in  phase  at 
the  lower  frequencies  but  not  at  the  higher  fre- 
quencies. The  pattern  of  the  latter  may  be  spurious 
because  of  the  small  number  of  observations  upon 
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Figure  4. —  Various  spectral  estimates  for  rain  gages  3  and  4, 
Crab  Creek  Watershed  W-L 
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Figure  5.  —  Station  location  map,  Cherokee,  Okla. 


which  it  is  based.  There  is  a  slijiht  indication  in  the 
phase  spectrum  that  the  storm  is  moving  from  gage 
201  toward  gages  60  and  160. 

Discussion 

The  daily  data  for  Crab  Creek  and  the  15-ininute 
increments  from  the  three  other  watersheds  exhibit 
distinctly  different  spectra.  The  daily  data  appear 
to  occur  randomly  in  time.  Tliis  is  partially  due  to 


the  method  of  sampling;  that  is,  the  use  of  calendar- 
day  rainfalls  add  to  the  random  component  be- 
cause the  rainfall  might  overlap  2  days  and  yet  be 
considerably  less  than  24  hours  in  duration.  The 
15-minute  rainfall  spectra  all  show  considerable 
persistence  as  measured  by  the  large  amount  of 
variance  that  is  explained  by  the  lower  frequencies. 

The  Cherokee  example  has  shown  that  small  dif- 
ferences in  rainfall  characteristics  occur  in  short 
distances  even  though  the  rainfall  is  associated 
with  thunderstorm  activity.  Opposed  to  this  is  the 
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nonthunderstorm  example  of  Sleepers  River  where 
the  differences  in  rainfall  with  time,  at  longer  dis- 
tances than  at  Cherokee,  are  also  nearly  negligible. 

Examination  of  the  phase  of  the  cross  spectra 
shows  very  little  difference  at  the  smaller  fre- 
quencies suggesting  that  the  larger  rainfalls  occur 
nearly  simultaneously  and  are  about  the  same 
magnitude.  The  larger  frequencies  are  erratic  and 
may  be  spurious  because  of  the  small  length  of 
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Figure  8. —  Station  location  map.  North  Appalachian  Experi- 
mental Watershed,  Coshocton,  Ohio. 
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Figure  7.  —  Spectral  statistics  for  gages  5,  7,  and  8. 
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Figure  9. —  Time  series  of  15- minute  increments  for  storm  of 
April  19,  1940. 
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Figure  10. -Power  spectra  for  gages  101,  102,  and  103. 


record  used  to  estimate  the  spectral  pattern.  Yet, 
some  indication,  although  small,  does  emerge  as 
to  the  relevant  movement  of  rainfall  with  time  in 
an  individual  storm.  Additional  experimentation 
with  longer  storms  and  with  increments  of  various 
durations  undoubtedly  will  help  resolve  this 
problem. 
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Figure  11.  — Coherence  spectra  for  three  combinations  of  pairs 
of  rain  gages. 
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Figure  13.  — Station  location  map,  Sleepers  River  Research 
Watershed,  Danville,  Vt. 


Figure  12.  — Amplitude  of  cross  spectrum  and  phase  of  cross 
spectrum  as  a  function  of  frequency  for  three  combinations 
of  pairs  of  rain  gages. 
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Figure  14. —Time  series  of  15-minute  increments  for  storm  of  October  5,  1962. 
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Figure  15.  — Power  spectra  for  gages  60,  160,  and  201. 
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Figure  16. -Coherence  spectra  for  three  combinations  of  pairs  of  rain  gages. 
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Figure  17.- Amplitude  of  cross  spectrum  and  phase  of  cross  spectrum  as  a  function  of  frequency  for  three  combinations  of  pairs 

of  rain  gages. 


HOURLY  RAINFALL  GENERATION  FOR  A  NETWORK 

By  D.  D.  Franz  ' 


Abstract 

Generation  of  synthetic,  hourly  rainfall  data  for 
a  network  of  stations  is  investigaged  because  exist- 
ing monthly  streamflow  models  are  inadequate  for 
many  purposes  and  because  watershed  models  are 
available  to  accurately  predict  streamflow  given  the 
hourly  rainfall  on  the  watershed.  The  use  of  syn- 
thetic rainfall  with  a  watershed  model  should  give 
more  detailed  traces  of  synthetic  streamflow  than 
the  monthly  total  flows  obtained  from  currently 
available  stochastic  streamflow  models. 

The  storm  model  is  based  on  the  multivariate 
normal  distribution.  Persistence  is  incorporated  in 
the  model  by  including  the  rainfall  in  the  current  and 
previous  hours  as  variables.  A  fractional  power 
transformation  is  used  to  transform  the  observed 
distribution  to  normal  at  each  station.  The  finite 
probability  of  zero  rainfall  is  simulated  by  setting 
all  values  below  a  predetermined  cutoff  to  zero 
before  computing  the  inverse  transformation  in  the 
genrating  process.  The  estimation  of  the  storm 
model  parameters  is  complicated  by  the  presence 
of  zeros  in  the  data.  Nonlinear  least  squares  is  used 
to  estimate  the  transformation  constant  as  well  as 
the  mean  and  variance  of  the  transformed  data  at 
each  station  in  each  of  four  seasons.  The  covariance 
matrix  for  the  transformed  variables  is  estimated 
by  applying  the  relation  between  bivariate  normal 
variates  to  the  conditional  cumulative  distribution 
function  of  each  pair  of  variables.  The  intcrstorm 
model  consists  of  empirically  defined  distributions 
of  interstorm  length  for  50  periods  in  a  year  to 
closely  approximate  the  annual  variation  in  inter- 
storm characteristics. 

The  model  was  tested  on  a  three-station  network 
in  northern  Cahfornia.  Initial  results  were  very 
encouraging.  Further  research  currently  underway 
is  improving  the  methods  used  for  estimating  the 
parameters   such   that   the   observed  covariance 
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matrix  for  the  network  is  reproduced.  The  earlier 
technique  did  not  reproduce  the  covariance  matrix. 

Introduction 

The  primary  motivation  for  generating  hourly 
rainfall  data  is  to  use  such  data  as  the  rainfall 
input  to  essentially  deterministic  watershed 
simulation  models.  The  combination  of  stochastic 
generation  of  rainfall  and  the  deterministic  simula- 
tion of  the  streamflow  in  a  watershed  appears  to 
have  significant  advantages  over  the  more  common 
stochastic  streamflow  generation  techniques.  One 
major  advantage  is  the  abtlity  to  produce  flow  se- 
quences covering  the  entire  spectrum  of  interest: 
from  peak  flow  rates  to  annual  volumes.  The  water- 
shed models  used  for  this  purpose  are  based  on 
known  physical  principles  governing  the  movement 
of  water  in  the  watershed.  Thus,  another  major 
advantage  of  this  method  is  its  inherent  flexibility  to 
accommodate  changes  in  the  watershed.  Stochastic 
streamflow  generation  techniques  are  not  Si) 
accommodating.  They  require  relatively  long  statisti- 
cally homogeneous  records  to  define  their  param- 
eters. These  parameters  cannot  be  related  to  easily 
observable  changes  in  the  watershed. 

A  fertile  area  of  application  of  this  combination 
would  be  in  the  urban  hydrology  context.  Change  in 
the  hydrologic  characteristics  of  urban  watersheds 
is  the  rule  rather  than  the  exceptit)n.  Formerly 
pervious  land  areas  are  covered  by  buildings,  roads, 
and  parking  lots.  Natural  channels  are  "improved" 
by  lining  or  realintMneiit.  flic  tlooii  plain  ot  the 
stream  is  often  heavily  developed  witli  the  con- 
current loss  of  the  naturally  occurring  overbank 
storage.  These  changes  vitiate  the  assumptions  of 
stochastic  streamflow  generation  since  the  changes 
often  have  a  significant  efVcct  on  the  streamflow. 
On  the  other  hand,  the  ctYcct  of  urbanization  on 
rainfall  is  nuicli  smaller  than  the  eflect  on  stream- 
flow.  The  watershed  model  parameters  can  be  es- 
tablished bv  a  relativcU  slu>rt  (3  to  5  years)  period  of 
concurrent   strcamtlo\>    and   raintall  observations. 
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The  parameter  changes  caused  by  channel  and  im- 
pervious cover  modifications  in  the  watershed  can 
be  evaluated  directly  without  recourse  to  observed 
streamflow  and  rainfall.  The  technique  is,  thus, 
unaffected  by  changes  in  the  watershed  and  can, 
in  fact,  be  used  to  predict  the  effect  of  these  changes 
on  streamflow.  See  Franz  (4)  for  additional 
discussion. 

Rainfall  Characteristics 

A  major  characteristic  of  rainfall  is  its  highly 
sporadic  behavior  in  comparison  to  streamflow. 
Hourly  rainfall  data  reveal  that  periods  of  significant 
rainfall  are  often  broken  by  short  time  intervals 
of  no  rainfall.  Another  characteristic  of  rainfall  is 
the  presence  of  persistence  in  the  shorter  time  in- 
terval data  series.  Hourly  rainfall  exhibits  higher 
persistence  than  daily  rainfall  values.  Monthly  and 
annual  values  of  rainfall  often  appear  to  be  uncor- 
related;  however,  exceptions  do  occur.  Grace  and 
Eagleson  (5 ) ,  Wiser  (15),  and  Pattison  {10)  discuss 
persistence  in  rainfall  at  length. 

The  shorter  time  interval  data  are  needed  to  ade- 
quately model  the  response  of  the  watershed.  Hourly 
rainfall  data  are  the  maximum  interval  for  most 
watersheds  if  peak  streamflow  values  are  of  interest. 
Shorter  time  intervals  are  needed  if  small  or  highly 
urbanized  watersheds  are  to  be  simulated.  Thus, 
the  time  interval  cannot  be  lengthened  to  reduce  the 
problems  caused  by  persistence  because  the 
time  interval  is  already  established  by  other 
considerations. 

The  distribution  of  rainfall  over  a  watershed  is  of 
significance  in  simulating  the  response  of  the  water- 
shed. Significant  runoff  is  often  caused  by  a  rain- 
storm which  affects  only  part  of  the  watershed  area. 
Therefore,  the  capability  for  generating  internally 
consistent  rainfall  values  for  a  network  of  stations 
of  moderate  size  must  be  developed  to  model  these 
effects. 

Model  Structure 

A  number  of  single  station  models  have  been  de- 
veloped to  generate  rainfall  data  stochastically. 
Pattison  (10)  used  a  Markov  chain  to  generate 
traces  of  hourly  rainfall  data.  Grace  and  Eagleson 


(5)  used  a  multiple  stage  model  in  which  storm 
lengths  were  selected  from  a  distribution.  The  storm 
length  was  used  to  predict  a  storm  depth.  Then  non- 
random  sampling  using  an  urn  model  was  used  to 
fill  in  the  distribution  of  10-minute  interval  amounts 
within  the  storm  length.  Sariahmed  and  Kisiel 
(12),  Raudkivi  and  Lawgun  {11),  Grayman  and 
Eagleson  (6),  and  Hiemstra  and  Creese  (7)  have 
also  developed  models  which  are  very  similar  to 
the  model  of  Grace  and  Eagleson.  Models  for  daily 
and  monthly  rainfall  values  have  also  been  de- 
veloped but  are  not  of  interest  for  our  present 
purpose.  All  of  the  above  models  are  restricted  to 
the  single  station  case.  None  of  the  techniques 
appears  to  generalize  tractably  to  a  multiple  station 
case.  The  methods  used  to  generate  streamflow 
data  stochastically  were  also  reviewed,  but  the  very 
sporadic  nature  of  hourly  rainfall  precluded  direct 
use  of  any  of  these  models.  Thus,  a  different  approach 
to  the  modeling  problem  was  explored. 

A  number  of  definitions  are  needed  before  the 
structure  of  the  model  can  be  described.  A  storm 
is  taken  to  be  a  consecutive  series  of  hours  in 
which  each  hour  has  rainfall  recorded  at  one  or 
more  stations  in  the  network.  An  interstorm  is  a 
consecutive  series  of  hours  in  which  no  rainfall  was 
recorded  in  the  network.  A  season  is  taken  to  be  a 
period  of  the  year  in  which  rainfall  characteristics 
are  generally  constant.  Storms  and  interstorms  are 
assigned  to  the  season  in  which  they  begin. 

A  model  for  storms  and  a  different  model  for 
interstorms  is  proposed  since  the  physical  condi- 
tions during  these  periods  strongly  suggest  that  the 
statistical  properties  of  each  process  will  be 
radically  different.  The  multivariate  normal  dis- 
tribution is  the  basic  component  of  the  storm  model. 
It  is  assumed  that  the  hourly  rainfall  data  for  the 
network  can  be  transformed  such  that  the  trans- 
formed data  will  appear  as  a  "sample"  from  a 
multivariate  normal  distribution.  The  sample  will 
not  be  random  nor  will  it  cover  the  range  of  the 
distribution.  The  nonrandomness  stems  from  per- 
sistence in  hourly  rainfall  data,  whereas  the  limited 
range  is  dictated  by  rainfall  being  a  positive  process 
(that  is.  negative  values  have  no  meaning).  The 
persistence  is  included  by  treating  the  transformed 
series  as  a  Markov  sequence  of  lag  one.  The  limited 
range  is  included  by  assuming  that  all  negative 
values  have  been  set  to  zero  before  we  observe  the 
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sample.  Thus,  the  observed  probability  of  zero  rain- 
fall values  can  be  included  in  the  model. 

The  transformation  used  to  normalize  the  marginal 
probability  distributions  was  of  the  form: 

Y^a  +  bX"  (2) 

w^here  Y  is  the  normal  variable;  X  is  the  variable 
being  transformed;  and  a,  b,  and  q  are  parameters. 
Tukey  {14}  found  transformations  of  this  type  to  be 
quite  powerful  in  normalizing  distributions.  Normal 
marginal  distributions  do  not  insure  that  a  multi- 
variate distribution  is  also  normal.  However,  the 
natural  assumption  to  make  is  that  an  adequate  ap- 
proximation would  be  obtained  by  transforming  the 
marginals  to  the  normal  distribution. 

The  above  assumptions  lead  to  the  following  form 
for  the  generating  equation: 

Xi+i  =  fji  +  SiSo^Xi  — ijl) +LZi+i  (2) 

where 

So  =  E{xixJ)  (3) 

=  (4) 

LL^=So-S,Sr'S[  (5) 

Xi  is  the  vector  of  transformed  rainfall  values  in 
the  ith  hour,  is  the  vector  of  means  for  Xi,  S\  is 
the  lag-one  covariance  matrix  for  transformed 
rainfall,  So  is  the  lag-zero  covariance  matrix  for 
transformed  rainfall,  Z,  is  a  vector  of  standard 
normal  independent  variables,  L  is  a  lower  triangular 
matrix,  Xi  is  the  vector  of  deviations  of  A'j  from  /u., 
E  denotes  the  expected  value  operator,  and  super- 
script T  denotes  the  transpose  of  a  vector  or 
matrix.  All  the  vectors  are  taken  to  be  column 
vectors. 

The  interstorm  model  consists  of  a  number  of 
empirically  defined  distributions  with  a  separate 
distribution  for  each  of  50  periods  in  the  calendar 
year.  This  model  was  selected  after  extensive  test- 
ing revealed  that  no  single  distribution  could  be 
fitted  accurately  to  the  interstorms.  The  resulting 
distributions  were  smoothed  slightly  to  provide  for 
some  generalization  and  to  fill  in  some  periods  in 
which  data  were  very  sparse. 


Parameter  Estimation 

The  test  network  for  the  model  consisted  of  three 
stations  in  the  Russian  River  basin  approximately 
70  miles  north  of  San  Francisco.  Calif.  The  climate 
is  cool  and  moist  in  the  winter  and  warm  and  dry  in 
the  summer.  Most  rainfall  is  caused  by  frontal 
activity  superimposed  on  local  orographic  effects. 
The  stations  used  were  Yorkville.  Redwood  Valley, 
and  The  Geysers.  The  record  length  used  was  23 
years. 

The  monthly  rainfall  data  were  tested  for  per- 
sistence by  computing  a  linear  regression  between 
consecutive  months.  The  variance  ratio  test  (F- 
test)  did  not  reject  the  hopothesis  of  no  linear  rela- 
tion at  the  5-percent  level.  The  number  of  runs  up 
and  down  and  the  circular  serial  correlation  coef- 
ficient were  used  to  test  for  persistence  in  the 
annual  rainfall  series  (based  on  a  climatic  year  from 
July  through  June).  The  hypothesis  of  randomness 
was  not  rejected  by  either  test  at  the  5-percent  level 
of  significance. 

The  season  boundaries  were  established  by  sub- 
jectively evaluating  the  changes  in  the  cumulative 
distributions  of  rainfall  depths,  storm  lengths,  and 
interstorm  lengths  for  50  periods  in  the  year.  Four 
seasons  were  selected  with  the  following  approxi- 
mate boundaries:  season  L  November  10  to  April  5; 
season  2.  April  6  to  June  17;  season  3.  June  18  to 
September  6;  and  season  4.  September  7  to  .Novem- 
ber 9.  Season  1  was  the  wet  season,  whereas  sea- 
son 3  was  nearly  dry.  Seasons  2  and  4  served  as 
transition  seasons. 

The  sequence  of  storms  and  interstorms  was 
tested  for  nonrandomness  by  computing  regressions 
between  the  logarithms  of  the  storm  and  interstorm 
lengths.  This  test  was  applied  to  season  1  only  since 
the  other  seasons  had  inadequate  data.  The  f-test 
rejected  the  hypotheses  of  no  relation  at  the  5- 
percent  level.  However,  the  best  relationship  could 
only  account  for  a  mere  4.5  percent  of  the  variance. 
Thus,  the  storm  and  interstorm  sequences  were 
treated  as  being  random  for  the  current  study. 

These  preliminary  tests  were  conducted  to  ensure 
that  none  of  the  assumptions  implicit  in  the  model 
were  seriously  violated.  Fortunately,  thev  were  not. 
The  inclusion  of  persisteiue  in  the  nionthU  and 
annual  rainfall  totals  would  have  nuulc  the  nunlel 
considerably  more  complex.  Significant  dependence 
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between  storm  and  interstorm  lengths  adds  addi- 
tional complexity. 

An  array  of  rainfall  data  was  constructed  for  each 
season  for  use  in  estimating  the  storm  model  param- 
eters. The  first  row  of  the  three-column  array  was 
filled  with  zeros.  Then  the  first  storm  in  the  season 
was  placed  in  the  array  followed  by  a  row  of  zeros. 
The  next  storm  occurring  in  the  season  was  added 
with  its  terminating  row  of  zeros.  This  was  repeated 
for  each  season  until  all  the  storms  had  been  placed 
in  the  appropriate  array.  There  were  approximately 
17,000  observations  in  season  1  not  counting  the 
dehmiting  zeros.  The  other  seasons  had  far  fewer 
observations. 

The  parameters  a,  b,  and  q  were  estimated  for 
each  station  by  fitting  equation  1  to  the  sample 
cumulative  distribution  function  (CDF)  for  each 
station  for  each  season.  The  delimiting  zeros  were 
included  in  this  case  in  an  attempt  to  cause  the 
storm  model  to  generate  the  correct  length  of 
storms.  The  probability  of  a  zero  rainfall  within  a 
storm,  neglecting  the  delimiting  zeros,  varied  from 
0.2  to  0.5.  The  CDF's  were,  therefore,  of  mixed 
type.  As  a  result,  the  rainfall  sample  was  viewed  as 
censored  for  estimation  purposes.  The  total  number 
of  observations  was  correct,  but  all  observations 
less  than  zero  had  been  set  to  zero.  The  probability 
plotting  approach  used  in  the  nonlinear  least  squares 
fitting  process  does  not  require  complete  informa- 
tion about  the  magntiudes  of  the  observations  since 
at  least  the  upper  portion  of  the  CDF  can  be  de- 
fined. Thus,  the  indeterminancy  in  the  transformed 
value  corresponding  to  a  zero  rainfall  was  avoided. 

The  values  of  a  and  b  were  used  to  compute  the 
means  and  variances  of  the  transformed  rainfall. 
With  these  values  known,  the  covariance  values 
needed  were  computed  by  fitting 

Y=rX+{l-r^yi-^Z  (6) 

to  the  conditional  CDF's  for  each  unique  pair  of 
variables  defined  by  equations  3  and  4.  In  equation 
6,  Y  and  are  correlated  standard  normal  variables, 
Z  is  a  standard  normal  variable,  and  r  is  the  correla- 
tion coefficient  relating  Z  and  Y.  A  plot  of  Y  versus  Z 
for  a  given  X  would  yield  a  straight  line  on  normal 
probability  paper  with  a  slope  of  (1  — r'^)^'^  ^^j^j  jj^. 
tersect  the  median  Hue  at  rX.  Nonlinear  least 
squares  was  again  used  to  estimate  r.  The  value  of 


r  was  then  used  with  the  variances  to  estimate  the 
covariances  needed  in  So  and  Si. 

The  fit  obtained  for  the  marginal  CDF's  was 
quite  good  with  less  than  2  percent  of  the  sum  of 
squares  of  the  dependent  variable  left  unexplained. 
The  fit  for  computing  correlation  coefficients  was 
much  poorer  with  as  much  as  30  percent  left  un- 
explained. However,  most  values  fell  in  the  range  of 
10  percent  unexplained.  The  transformation  ex- 
ponents fell  in  the  range  of  0.25  to  0.50.  This  com- 
pares well  with  the  results  obtained  by  Stidd  (13) 
using  a  similar  transformation  process.  The  correla- 
tions between  transformed  rainfall  values  were  in 
the  range  of  0.5  to  0.75. 

The  parameter  estimates  for  the  interstorm  model 
were  made  by  computing  sample  CDF's  for  each  of 
the  50  periods  and  smoothing  them  slightly  as 
pointed  out  earlier.  The  interstorm  lengths  covered 
the  range  from  1  hour  to  over  3,500  hours.  This 
large  range  and  the  sharply  skewed  distributions 
made  fitting  of  more  parsimonious  distributions 
impossible. 

The  Generation  Process 

The  generation  process  of  a  storm  proceeded  as 
foUows: 

•  The  transformed  values  are  given  at  each  of  the 
three  stations  at  time  t—1,  where  t  is  the  first 
hour  of  the  current  storm.  These  values  caused 
termination  of  the  previous  storm. 

•  Draw  a  sample  of  three  correlated  normal  random 
numbers  using  equation  2. 

•  Compute  the  inverse  transformation.  All  negative 
numbers  generated  in  step  b  are  inverted  to  a 
value  of  zero. 

•  Sum  the  values  of  rainfall  (results  of  step  c)  in 
the  current  hour.  If  the  sum  is  zero  and  nonzero 
sums  have  occurred  before  in  this  storm,  then 
the  storm  is  complete;  otherwise  ignore  the 
values  and  go  back  to  step  b.  If  the  sum  is 
greater  than  zero,  save  the  rainfall  values  and 
go  back  to  step  b  and  generate  the  next  hour 
of  rainfall 

After  a  storm  was  complete,  a  value  for  the  following 
interstorm  length  was  computed  by  sampling  from 
the  appropriate  distribution. 
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Results 

The  model  was  used  to  generate  a  number  of  syn- 
thetic hourly  rainfall  traces  to  test  for  reproduction 
of  selected  statistics.  Initial  testing  revealed  a  con- 
sistent discrepancy  in  the  number  of  0.01  inch  per 
hour  amounts  generated  at  each  station.  The  source 
of  the  discrepancy  could  not  be  pinpointed,  but  the 
fit  of  the  marginal  distributions  at  the  low  end  of  the 
rainfall  scale  was  a  probable  cause.  Fortuitously, 
the  sum  of  the  number  of  occurrences  of  0.01  inch 
per  hour  and  0.02  inch  per  hour  events  were  ap- 
proximately the  number  required  to  remove  the 
discrepancy.  Thus,  all  values  greater  than  0.01  inch 
were  reduced  by  0.01  inch  to  effect  this  addition  to 
the  model.  The  reproduction  of  the  marginal  dis- 
tributions improved  markedly  with  this  adjustment. 

The  comparisons  made  in  testing  the  model  can 
only  be  outlined  here.  (See  Franz  (4)  for  additional 
details.)  The  final  test  runs  consisted  of  five  runs 
of  40  years  each  with  each  run  preceded  by  a  2-year 
warmup  period  to  reduce  the  effect  of  initial 
transients.  The  analysis  of  the  results  was  per- 
formed at  the  same  time.  Each  run  took  approxi- 
mately 5  minutes  of  IBM  360/67  computer  time. 
Thus,  approximately  2.5  seconds  was  required  to 
generate  and  analyze  a  station- year  of  hourly  rainfall 
data. 

The  marginal  distributions  were  compared  by 
computing  the  maximum  deviation  in  probability 
between  the  observed  and  generated  CDF's.  In 
this  comparison,  the  delimiting  zeros  in  the  data 
array  were  not  included.  The  maximum  deviations 
obtained  in  the  respective  seasons  starting  with 
season  1  were  0.05  (17828,30400).  0.06  (3248.5400), 
0.20  (256,488),  and  0.10  (2582,4970).  The  first  number 
in  parentheses  gives  the  observed  sample  size  and 
the  second  gives  the  average  size  of  the  five  gen- 
erated samples.  The  probability  of  a  zero  amount 
was  reproduced  very  accurately  in  season  1  with  less 
than  a  3-percciit  difference.  The  other  seasons  iiad 
discrepancies  less  than  10  percent. 

The  storm  length  distributions  were  compared 
using  the  .Smirnov  two-sample  goodness  of  fit 
test  at  the  5-percent  level.  Tlie  hypothesis  of  no 
difference  was  not  rejected  in  seasons  2.  3,  and  4 
but  was  rejected  in  season  1.  A  comparison  ot  tiic 
means  and  standard  deviations  of  the  storm  lengths 
revealed  that  the  difficuhy  in  season  1  was  probably 
caused  by  a  20-pereent  underestimate  of  the  vari- 


ance. The  mean  values  differed  by  only  a  small 
fraction  of  an  hour. 

The  Smirnov  test  was  again  used  in  comparing 
the  monthly  rainfall  distributions.  In  this  case, 
the  sample  sizes  were  23  and  40.  The  approximate 
formula  for  large  sample  sizes  could  not  be  used 
for  computing  the  critical  value  of  the  test  statistic. 
Thus,  the  method  used  by  Birnbaum  and  Hall  (/  ) 
was  used  to  find  the  rejection  limits.  Approximate 
limits  of  0.34  at  0.05  level  and  0.41  at  the  0.01 
level  were  found.  Only  12  distributions  out  of  180 
cases  were  rejected  at  the  0.05  level.  None  were 
rejected  at  the  0.01  level. 

A  comparison  of  monthly  means  and  variances 
revealed  a  serious  discrepancy  in  the  winter  months. 
The  January  and  February  amounts  were  under- 
estimated by  about  10  to  15  percent,  whereas  March 
was  overestimated  by  about  25  percent.  Thus, 
although  the  annual  means  and  standard  deviations 
were  reproduced  well,  the  distribution  of  rainfall  in 
the  wet  season  was  distorted. 

The  mean  vector  and  covariance  matrix  for  the 
rainfall  within  storms  (excluding  delimiting  zeros) 
were  also  compared.  (These  were  computed  for 
rainfall  and  not  transformed  rainfall.)  The  mean 
vector  appeared  to  be  reproduced  within  10  percent, 
and  the  variances  were  also  reproduced  within  the 
same  limits;  however,  the  covariances  were  all 
underestimated  by  20  percent  or  more. 

Another  test  of  the  storm  model  was  the  reproduc- 
tion of  the  observed  relationship  between  the  storm 
length  and  the  storm  rainfall  depth.  \  linear 
regression  between  these  variables  was  computed 
for  the  observed  and  generated  rainfall.  The  inter- 
cept, the  slope,  and  the  square  of  the  correlation 
coefficient  were  then  compared.  In  season  1,  all 
values  agreed  within  10  percent  of  the  observed 
values.  With  the  exception  of  season  3  (with  very 
limited  data),  the  olhcr  values  agreed  within  10  to  20 
percent. 

A  final  lest  was  made  by  using  the  hourly  rainfall 
at  Yorkville  as  input  to  a  watershed  model.  lUc 
Hydrocomp  Simulation  Program  was  used  tor  this 
purpose.  (See  Hydrocomp  [H)  for  details. ^  The 
stream  used  was  Dry  ('reek  gaged  at  C^loverdale. 
Calif.,  with  a  ilrainage  area  of  88  stpian-  miles.  I  he 
Hood  peaks  and  the  annual  volumes  obtained  froni 
synthetic  rainfall  wer«>  conjpared  to  the  same 
values  oblaincil  using  the  historic  raintall.  The 
eflect  of  strcaintlow  prediction  error  in  the  \N.Uei 
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shed  model  was  thus  reduced.  The  mean  and 
standard  deviation  for  the  synthetic  streamflow 
were  26.99  inches  and  9.27  inches,  respectively, 
while  like  values  from  historic  rainfall  were  25.17 
inches  and  11.25  inches.  The  mean  annual  flood 
peak  (2.33-year  recurrence  interval)  was  reproduced 
well  as  were  the  return  periods  up  to  10  years. 
However,  larger  return  period  flows  deviated  more 
markedly  with  a  20  percent  underestimate  of  the 
25-year  peak  flow  event.  This  variation,  however, 
is  consistent  with  short  periods  of  record. 

Discussion 

The  performance  of  the  model  was  not  un- 
equivocally successful.  Some  characteristics  of  the 
observed  rainfall  record  were  reproduced  well. 
These  were  the  marginal  distributions,  means  and 
variances  of  storm  rainfall,  and  the  storm  length 
distribution.  The  storm  length  characteristics  were 
not  used  explicitly  in  the  parameter  estimation  and, 
thus,  the  storm  length  results  are  quite  unexpected. 
The  annual  rainfall  values  were  also  duplicated  as 
were  the  annual  flow  volumes  of  streamflow.  The 
relationship  between  storm  rainfall  depth  and  storm 
length  also  appeared  to  be  reproduced  adequately. 

On  the  other  hand,  the  monthly  distribution  in 
season  1  was  distorted,  and  the  covariance  matrix 
was  not  reproduced  at  all  well.  The  necessity  of 
making  the  adjustment  to  the  storm  model  to  repro- 
duce the  low  end  of  the  marginal  distributions  is  also 
a  shortcoming. 

The  monthly  distribution  problem  could  be 
solved  by  using  more  seasons  during  the  winter 
months  to  more  accurately  follow  the  changing 
storm  and  interstorm  characteristics.  The  tech- 
niques for  estimating  the  parameters  can  be 
changed  so  that  reproduction  of  selected  statistics 
is  ensured.  The  marginal  distributions  could  be 
fitted  with  side  conditions  imposed  to  ensure  that 
the  observed  mean  and  variance  of  the  storm  rain- 
fall be  duphcated.  However,  this  might  lead  to  a 
further  degradation  in  the  reproduction  of  the 
complete  marginal  distribution. 

Reproduction  of  the  covariance  structure  could  be 
improved  by  using  the  desired  covariance  in  the 
storm  rainfall  explicitly  in  the  estimation  process. 
This  would  involve  an  iterative  process  using  either 
bivariate  numerical  integration  or  Monte  Carlo 
methods  to  predict  a  rainfall  covariance  given  the 


covariance  of  the  transformed  rainfall.  Preliminary 
studies  subsequent  to  the  development  of  the  rain- 
fall model  have  indicated  that  trial  and  error 
adjustment  processes  using  each  pair  of  variables 
might  be  feasible.  The  covariance  of  the  transformed 
rainfall  would  be  adjusted  until  the  covariance  of  the 
storm  rainfall  was  reproduced  closely  enough. 

Additional  testing  of  the  model  is  nearing  comple- 
tion at  Stanford  University.  The  model  was  appUed 
to  a  two-station  network  in  North  Carohna  in  which 
thunderstorm-  and  hurricane-induced  rainfall  oc- 
curred. Prehminary  results  indicate  that  the  model 
will  perform  well  under  these  circumstances  also. 
The  model  was  also  applied  to  a  slightly  different 
network  in  the  Russian  River  basin.  Somewhat 
improved  parameter  estimating  techniques  were 
used,  and  an  additional  season  was  introduced  to 
reduce  the  distortion  of  monthly  rainfall  totals 
during  the  wet  season  (9). 

Improving  the  performance  of  the  model  is  of 
concern,  but  of  greater  current  concern  are  the 
interesting  and  somewhat  vexing  problems  en- 
countered in  developing  and  testing  the  model. 
These  problems  are  principally  of  a  statistical 
nature.  Perhaps  the  major  difficulty  encountered  at 
the  outset  of  the  development  was  the  lack  of 
sufficiently  general  but  yet  sufficiently  tractable 
multivariate  probability  distributions.  The  resort  to 
the  multivariate  normal  distribution  was  a  necessity 
because  the  few  other  multivariate  distributions  are 
either  very  restrictive  or  are  untractable  for  practical 
use  in  data  generation.  Even  the  multivariate  normal 
is  restrictive  in  that  the  regression  relationship 
between  variables  is  a  linear  one.  However,  intro- 
duction of  nonlinear  relationships  then  raises  the 
difficult  choice  of  what  form  of  nonlinear  relationship 
to  use. 

The  use  of  nonlinear  transformations  can  avoid 
some  of  the  problems  with  the  multivariate  normal. 
However,  the  estimation  of  parameters  is  compli- 
cated when  such  transforms  are  used.  An  interesting 
study  could  be  made  of  the  eff^ects  of  various  trans- 
formations on  parameter  estimation.  A  good  trans- 
formation to  start  with  would  be  the  one  given  in 
equation  1.  Another  problem  that  might  be  fruitfully 
studied  is  treatment  of  mixed  distribution  types  in  a 
multivariate  distribution. 

Difficulty  was  also  encountered  in  applying 
traditional  tests  of  hypotheses  and  goodness-of-fit 
tests.  These  tests  are  fairly  well  developed  and 
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understood  when  the  underlying  population  is 
normally  distributed  or  if  the  sample  size  is  large. 
However,  just  how  large  is  large  enough  is  extremely 
difficult  to  say  as  pointed  out  by  Bradley  (2). 
Distribution  free  tests,  such  as  the  runs  test  and 
the  Smirnov  test,  were  used  in  model  testing  where 
they  were  appropriate.  Advances  have  been  made 
in  this  area,  and  given  that  the  sample  is  a  random 
sample,  they  often  have  significant  advantages  over 
their  normal  theory  competitors.  However,  as  with 
the  case  of  linearity  versus  nonlinearity,  as  soon  as 
we  discard  the  requirement  of  the  random  sample, 
we  find  ourselves  in  a  statistical  quandry. 

For  example,  the  observed  and  generated  marginal 
distributions  were  compared  but  no  statistical 
theory  was  used  to  indicate  how  large  a  discrepancy 
could  be  expected  even  if  the  underlying  mechanisms 
were  identical.  The  fact  of  the  matter  is  that  the 
high  values  of  persistence  in  the  hourly  rainfall  data 
invalidated  all  the  comparison  tests  found  in  a 
search  of  the  literature.  Some  progress  in  statistical 
inference  has  been  made,  if  the  persistence  can  be 
described  by  a  suitable  Markov  chain.  However, 
hydrologists  typically  make  far  greater  use  of 
Markov  sequences.  A  Markov  sequence  is  a  Markov 
process  in  discrete  time  but  with  continuous  state 
space.  Cox  and  Miller  (3)  point  out  that  the  theory  of 
Markov  sequences  is  not  as  fully  developed  as  the 
theory  of  Markov  chains.  They  go  on  to  suggest 
that  such  processes  be  treated  by  more  or  less 
ad  hoc  methods. 

Conclusions 

The  current  state  of  statistical  theory  requires 
that  considerable  ingenuity  be  used  to  develop  a 
workable  rainfall  generation  model.  The  charac- 
teristics of  the  rainfall  are  often  very  difficult  to 
mimic  with  the  statistical  tools  currently  available. 
Empirical  adjustments  and  a  proHferation  of 
parameters  must  often  be  used  to  obtain  an  accept- 
able level  of  performance.  Considerable  judgement 
and  trial  and  error  testing  will  be  required  for  some 
time  to  come  in  the  development  and  in  the  applica- 
tion of  these  models. 

Additional  research  on  statistical  techniques  in 
hydrology  is  needed.  Some  fruitfull  areas  of  study 
have  already  been  pointed  out.  The  use  of  ad  hoc 
methods  in  applied  hydrology  will  always  be  re- 


quired. However,  the  current  and  potential  utility  of 
statistical  methods  in  hydrology  is  sufficient  to 
justify  continued  effort  to  minimize  their  use. 
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STOCHASTIC   GENERATION  OF  THE   OCCURRENCE,  PATTERN,  AND 
LOCATION  OF  MAXIMUM  AMOUNT  OF  DAILY  RAINFALL^ 

By  A.D.  Nicks  ^ 


Abstract 

A  four-stage  stochastic  generation  technique  is 
used  to  synthesize  the  daily  rainfall  for  a  1,500- 
square-mile  area  in  central  Oklahoma.  Synthetic 
records  of  up  to  9  years  are  generated  at  168  loca- 
tions of  an  existing  rain  gage  network.  Spatial 
patterns  of  rainfall  for  input  to  a  hydrologic  model 
are  constructed  by  stochastically  generating: 
(1)  The  occurrence  or  nonoccurrence  of  rainfall  on 
each  day,  (2)  the  location  of  the  central  or  maximum 
amount  within  the  area,  (3)  the  maximum  amount, 
and  (4)  the  pattern  rainfall  over  the  1,500-square- 
mile  area  corresponding  to  the  central  amount. 
Tests  are  presented  for  the  representativeness  and 
consistency  of  generated  data  including  means, 
extremes,  and  frequency  of  occurrence  in  both  time 
and  spatial  distribution. 

Introduction 

Hydrologic  models  have  become  in  recent  years, 
more  sophisticated  as  knowledge  about  the  hydro- 
logic  process  has  increased.  Models  are  now  avail- 
able that  use  multipoint  rainfall  as  inputs.  The 
rehability  of  these  models  to  synthesize  records  of 
watershed  runoff  depends  upon  the  length  and 
cHmatic  variation  represented  in  the  rainfall  record. 
While  the  length  of  record  cannot  be  effectively 
extended  by  synthesizing  longer  records  of  rainfall, 
the  sequence  of  events  within  the  range  of  observed 
cHmatic  variations  may  be  rearranged  by  simulation 
to  make  the  model  a  more  powerful  tool  in  water 
resource  investigations. 

The  objective  of  this  study  is  to  synthetically 
construct  rainfall  records  at  points  within  a  network 
of  gages  by  stochastically  generating  spatial  and 
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temporal  patterns  of  daily  rainfall.  The  method, 
which  is  considered  to  be  a  first  approximation  of  a 
rainfall  generation  model,  was  developed  using  9 
years  of  data  collected  from  a  228-station  rain  gage 
network  located  in  central  Oklahoma.  Patterns  of 
daily  rainfall  were  constructed  by  stochastically 
generating  the  occurrence  or  nonoccurrence  of 
rain  on  each  day,  the  location  of  maximum  pattern 
amount,  the  maximum  rainfall,  and  the  associated 
pattern  of  rainfall  over  the  network  area. 

Network  Data 

Data  used  in  the  development  of  the  generation 
model  were  collected  from  the  base  rain  gage  net- 
work of  the  Southern  Plains  Watershed  Research 
Center,  ARS-USDA.  Chickasha,  Okla.  Gages  of 
this  network  are  distributed  in  a  uniform  pattern 
over  a  1,500-square-mile  area  of  the  Washita  River 
basin.  This  distribution  of  gages  is  shown  in  figure  1. 
The  basic  grid  from  which  9  years  of  records  were 
available  consists  of  168  gages.  Spacing  between 
these  gages  is  approximately  3  miles  in  both  the 
north-south  and  east-west  directions.  Records  from 
auxiliary  gages  shown  in  figure  1  were  for  fewer  than 
9  years  and  are  not  considered  in  this  study. 

The  climate  for  this  area  is  moist  to  dry  subhumid. 
Normal  annual  rainfall  ranges  from  33  inches  on  the 
east  to  28  inches  on  the  west  side  of  the  network. 
Annual  point  rainfall  has  been  as  much  as  42  inches 
and  as  little  as  16  inches.  The  difference  between 
annual  extremes  at  gages  30  miles  apart  has  been 
as  much  as  18  inches.  About  98  percent  of  the  yearly 
precipitation  occurs  as  rainfall,  with  the  remaining 
2  percent  occurring  as  snow  or  sleet.  Flooding  can 
occur  anytime,  but  occurs  most  frequently  during 
late  spring  and  early  fall  and  is  associated  with 
thunderstorm  rainfall. 

A  summary  of  the  9  years  of  data  from  the  network 
is  given  in  table  1.  Annual  mean  rainfall  at  the  168 
gages  has  ranged  from  20.70  to  34.56  inches.  In 
general,  the  period  of  the  observed  data  has  been 
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dry.  Out  of  the  108-month  record,  77  months  are 
below  normal.  The  average  number  of  days  per  year 
with  0.01  inch  or  more  of  rain  at  any  single  gage 
averages  about  65  days.  However,  rain  at  one  or 
more  stations  of  the  network  occurs  about  126  days 
per  year.  The  distribution  of  rainfall  throughout  the 
year  is  bimodal,  with  peaks  occurring  in  May  and 
September.  Daily  maximum  point  rainfall,  the 
amount  between  midnight  and  midnight  at  a  gage, 
ranged  from  1.60  inches  in  February  to  8.77  inches 
in  September.  Approximately  10  percent  of  the  168 
gages  have  experienced  a  storm  with  a  return  period 
greater  than  100  years. 

Generating  System 

To  synthetically  construct  patterns  of  daily  rain- 
fall, a  four-phase  generating  system  was  developed 


utilizing  the  rainfall  characteristics  of  the  9-year 
historical  record.  Each  phase  of  the  generating 
sequence  is  shown  in  figure  2.  In  this  sequence, 
each  phase  is  related  to  the  preceding  one.  The 
pattern  of  rainfall  is  dependent  on  the  maximum 
amount  recorded  which,  in  turn,  is  influenced  by 
the  position  of  the  storm  with  respect  to  network. 
Finally,  all  of  the  phases  are  dependent  on  the 
sequence  of  occurrence  of  rainfall  events  on  or 
near  the  network. 

Starting  with  the  occurrence  of  a  wet  or  dry  day, 
the  location  of  the  center  amount  and  pattern  of 
daily  rainfall  are  generated  by  randomly  sampling 
statistical  distributions  developed  to  represent  each 
of  these  characteristics.  The  uniform  random  num- 
ber generator  used  in  sampling  was  a  multiplicative 
or  congruential  type  tested  by  Maclaren  and 
Marsaglia  (9)  and  Van  Gelder  (11).  A  direct  method 


Table  I.— Summary  of  observed  data  for  168  gages  used  in  developing  the  generating  method 


Year 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Annual 

Average  rainfall,  in  inches,  at  168  stations 

1962  

0.42 

0.82 

0.84 

2.45 

2.77 

8.09 

2.03 

1.23 

5.00 

2.78 

1.39 

1.30 

29.12 

1963  

.24 

.30 

1.84 

2.67 

1.67 

3.60 

2.78 

1.18 

2.46 

.61 

2.68 

.67 

20.70 

1964  

.93 

2.12 

1.22 

1.33 

6.15 

1.27 

.87 

3.92 

4.56 

.90 

5.36 

.72 

29.35 

1965  

1.06 

.82 

1.17 

2.50 

4.01 

3.80 

.87 

4.89 

5.02 

1.49 

.08 

1.49 

27.20 

1966  

.53 

1.60 

1.03 

3.71 

.84 

2.13 

1.47 

6.07 

3.43 

.42 

.47 

.37 

22.07 

1967  

.31 

.13 

2.18 

5.47 

3.33 

2.52 

2.31 

1.24 

5.30 

2.66 

.31 

1.11 

26.87 

1968  

2.56 

1.31 

1.45 

2.70 

5.87 

2.70 

3.78 

2.16 

4.01 

2.16 

4.71 

1.16 

34.57 

1969  

.60 

2.09 

2.26 

2.18 

5.67 

3.40 

1.47 

2.95 

4.51 

1.68 

.22 

1.09 

28.12 

1970  

.11 

.46 

2.58 

3.17 

3.39 

2.13 

1.48 

1.33 

5.38 

1.98 

.66 

.32 

22.99 

Mean  

.75 

1.07 

1.62 

2.91 

3.74 

3.29 

1.90 

2.78 

4.41 

1.63 

1.77 

.91 

26.78 

Number  of  days  with  &  0.01  rainfall  within  the  network 


1962  

6 

6 

5 

13 

10 

22 

19 

7 

17 

9 

7 

10 

131 

1963  

5 

6 

11 

9 

15 

10 

11 

15 

17 

6 

6 

4 

115 

1964  

3 

9 

7 

11 

13 

14 

14 

18 

16 

7 

11 

6 

129 

1965  

7 

6 

5 

14 

16 

15 

13 

16 

15 

8 

7 

9 

131 

1966  

6 

8 

3 

8 

8 

12 

16 

21 

16 

2 

2 

9 

111 

1967  

2 

5 

11 

13 

13 

16 

18 

16 

16 

6 

6 

10 

132 

1968  

13 

10 

8 

11 

21 

12 

13 

16 

10 

7 

9 

6 

136 

1969  

7 

8 

10 

7 

19 

16 

9 

12 

14 

16 

6 

9 

133 

1970  

2 

9 

13 

13 

11 

9 

15 

11 

17 

10 

3 

4 

117 

Total  

51 

67 

73 

99 

126 

126 

128 

132 

138 

71 

57 

67 

1,135 

Maximum  daily  point  rainfall,  in 

inches 

3.43 

1.60 

2.23 

4.34 

6.43 

6.00 

3.86 

6.07 

8.77 

2.98 

2.49 

2.27 

8.77 
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Figure  2.  —  Flow  chart  of  the  rainfall  generating  system 


given  by  Abramowitz  and  Stegun  (/)  was  used  to 
calculate  random  normal  deviates. 


Sequence  of  Rainfall  Events 

The  method  selected  for  generatinf^  the  number 
and  distribution  of  rainfall  events  on  the  watershed 


was  a  two-state  Markov  chain.  This  method  has 
been  used  successfully  by  Gabriel  and  Neumann 
(7),  Caskey  (•?),  Wiser  (12),  and  DeCoursey  and 
Seely  (5)  for  generating  occurrence  of  wet  day-dry 
day  sequence  in  Israel,  Colorado,  North  Carolina, 
and  Texas.  This  method  involves  the  calculation  of 
two  conditional  probabilities:  (1)  a,  the  probability 
of  a  wet  day  following  a  dry  day,  and  (2)  /3,  the 
probability  of  a  dry  day  following  a  wet  day.  The 
two-state  Markov  chain  for  the  combination  of 
conditional  probabiUties  is: 


Future  state 
Dry  Wet 
Dry    1  —  a  a 

Present 

state       Wet  ft 


(1) 


1-/3 


Monthly  values  of  a  and  /3  were  calculated  using 
rainfall  5=  0.01  inch  at  one  or  more  stations  of  the 
network  as  the  criterion  for  a  wet  day.  These 
monthly  values  are  given  in  table  2.  Sequences  of 
wet  and  dry  days  were  determined  by  alternatively 
generating  uniform  random  numbers  and  testing 
against  the  seasonal  value  of  a  or  /3. 

Spatial  Distribution  of  Storm  Centers 

The  spatial  distribution  of  storm  systems  is 
assumed  to  be  random.  If  this  assumption  is 
correct,  then  stations  of  a  network  should  have  an 
equal  chance  of  recording  the  maximum  storm 

Table  2.— Seasonal  a  and  j3  conditional 
probabilities 

Month 


January  

February .... 

March  

April  

May  

June  

July  

August  

September. 

t)ctober  

November.. 
December.. 


Probability  of  a  wet  ilay  following  a  ilr\  liav. 
Probability  of  a  drv  ilaN  following  a  »et  ila\ 
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amounts  over  a  long  period  of  time.  With  sampling 
of  many  storm  systems  by  a  network,  a  uniform 
random  distribution  of  the  maximum  amounts 
should  result.  Figure  3  shows  the  distribution  of 
maximum  daily  rainfall  occurrences  at  each  station 
during  the  historical  record.  The  number  of  oc- 
currences ranged  from  36  at  the  extreme  northwest 
gage  to  zero  near  the  center  of  the  network.  In 
general,  the  larger  number  of  maximums  occurred 
on  or  near  the  boundaries.  The  large  values  at  these 
stations  are  attributed  to  storms  which  were 
centered  off  the  network,  yet  close  enough  so  that 


their  patterns  extended  to  just  a  few  gages  near  the 
boundary.  In  most  cases,  in  the  interior  of  the  net- 
work, the  maximums  were  considered  to  be 
uniformly  distributed. 

One  method  of  generating  locations  of  pattern 
maximums  is  to  generate  uniform  random  numbers 
between  0  and  1.  These  numbers  are  multipHed  by 
the  number  of  stations  in  the  network  and  rounded 
to  the  nearest  nonzero  integer.  The  resulting  num- 
ber is  the  station  number  selected  for  the  pattern 
center.  To  compensate  for  the  larger  number  of 
occurrences  at  the  boundary  of  the  network,  this 
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procedure  was  modified  by  adding  a  Ijypothetical 
number  of  gages  at  the  station  locations  having 
the  larger  number  of  occurrences.  A  ratio  of  the 
observed  value  to  the  expected  value  at  stations 
with  large  occurrences  was  used  to  determine  the 
additional  station  number  that  should  be  added. 
The  expected  value  at  a  station  for  the  historical 
record  was  six.  Using  this  adjustment,  an  array  of 
220  station  numbers  was  formed. 

Generation  of  a  gage  location  for  the  storm 
center  consisted  of  generating  a  random  number 
between  0  and  1,  multiplying  it  by  the  220,  and 
rounding  to  an  integer  value.  This  is  the  array 
element  which  contains  the  station  number  for  the 
pattern  maximum. 

Maximum  Daily  Rainfall 

Daily  rainfall  amounts  have  been  fitted  to  many 
types  of  frequency  distribution.  Brakensiek  (2) 
fitted  a  log  normal  distribution  to  daily  rainfall, 
whereas  Stidd  (10)  used  a  cube  root  normalizing 
procedure.  Kotz  and  Neumann  (8)  have  found  the 
gamma  distribution  to  adequately  fit  daily  rainfall 
amounts.  The  best  fit  for  maximum  daily  rainfall  «»n 
the  network  was  a  skew  normal  distribution  of  the 
form  attributed  to  Fiering  (6)  given  as: 


'g 

X  —  IX 

+  1 

1/:! 

-1 

.2 

a 

where  x  is  the  skewed  normal  variate;  A^,  the  raw 
variate;  and  At,  cr,  and  g,  the  mean,  standard 
deviation,  and  skew  coefficient  of  the  raw  variate. 

Maximum  daily  rainfall  amounts  for  each  event 
were  determined  for  each  month  of  the  year.  Using 
equation  2  and  the  mean,  standard  deviation  and 
skew,  skewed  normal  frequency  curves  were 
calculated.  Two  of  these  curves  are  shown  in  figures 
4  and  5.  To  generate  a  daily  maximum  amount,  a 
random  normal  number  is  drawn  and  the  raw  vari- 
ate. A",  is  calculated  using  equation  2. 

Rainfall  Pattern 

Many  studies  of  area-depth  or  isohyetal-depth 
relationships  for  storm  rainfall  have  been  made. 
Court  (4)  summarized  most  of  the  studies  made  in 
the  United  States  and  F.urope  and  suggested  a 
possible  model  for  depth  of  rainfall  at  a  given  dis- 
tance from  the  storm  center  as: 


In  this  equation,  «  and  b  are  constants  for  scale  and 
ellipticity;  /'max,  the  storm  center  rainfall;  and 
Px.ij,  the  rainfall  at  the  coordinates  X  and  Y  miles 
from  the  center.  A  similar  model,  which  is  given 
below,  was  used  for  the  deterministic  portions  of  a 
pattern  generator. 

Rainfall  depths  at  stations  of  the  network  were 
generated  using  the  deterministic  and  probabilistic 
model 

Pi  =  Pr...n  +  azVr^.  (4) 

where  Pj  is  rainfall  amount  at  station  j;  Pmax,  the 
maximum  rainfall;  cr,  the  standard  deviation  of  the 
storm  rainfall;  rj,  the  reduction  factor  for  station 
j;  and  z,  a  standard  normal  deviate. 

The  reduction  factor,  r,,  was  estimated  from  ob- 
served data  by  calculating  the  interstation  corre- 
lation between  Pmax  and  Pj.  Correlation  coefficients 
for  patterns  near  the  center  of  the  network  were 
calculated  and  composited.  These  correlations 
were  fitted  by  nonlinear  least  squares  to 

rj=e\p-{aX-  +  bY')  (5) 

where  (i  and  b  are  regression  constants,  and  A  and 
Y  are  coordinate  distances  in  miles  from  the  center 
station  to  station  7.  Values  of  a  and  b  from  the 
nonlinear  least  squares  fit  were  0.1386  and  0.0693, 
respectively,  with  a  multiple  correlation  of  0.82. 

Before  generating  patterns  of  daily  rainfall  using 
equation  4,  the  mean  and  standard  deviation  of 
the  pattern  were  required.  ¥ot  generation  of  these 
two  parameters,  the  mean  and  variance  of  each 
daily  event  from  168  gages  were  calculated  for 
each  month  of  the  historical  record.  Linear  regres- 
sions on  the  logarithms  of  the  mean  versus  maxi- 
mum amount,  and  standard  deviation  versus  mean 
and  maximum  amount  were  fitted  to  these  data. 
Figures  6  and  7  show  plots  of  the  mean  \ersus  the 
maximum  amounts  for  April  and  Sepleniber.  Sto- 
chastic models  were  developed  from  these  regres- 
sions by  adding  the  deviation  about  the  mean  line 
as  siiown  in  the  plots.  Ihe  niodt^ls  developed  lor 
the  mean  and  standanl  deviation  ol  patterns  were: 


/Li=  exp  (o  +  /)  In  /*,„..x"t'  tr^,;  \  1  —  r- )  [t) 
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o-=exp  (c  +  <ilnPn,ax+/ln/Li  +  o-^2Vl-r2)  (7) 


where  a,  b,  c,  d,  and  /  are  regression  constants;  z, 
a  random  normal  deviate;  cr^  and  (T(r,  the  standard 


deviations  of  the  log  values  of  fJi  and  cr;  and  r,  the 
multiple  correlation  coefficient.  Regression  statis- 
tics for  these  two  models  for  months  are  listed  in 
tables  3  and  4. 
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Figure  4.  — Distribution  of  maximum  daily  rainfall  for  June. 


2.0 


4.0  6.0 


10.0 


1                1           1         1       1      1     1     II                            1                1           1         1       1  1 

III  1 

^^---^ 

SEPTEMBER 

o  OBSERVED 

—  SKEWED  NORMAL  DISTRIBUTION 

/9 

9  / 

0  y 

9® 

e  y^ 
•0  y^ 
e  ^y 

e 

 — — - 

9 

9 

9 

^           1       1      1     r    1   1   1  1                  1           r       1      1     1  1 

III  1 

1      1     1  1 

1  1 

01  .02  .04      .06        .10  .20  .40      .60  1.0 

MAXIMUM  DAILY   RAINFALL  (INCHES) 


2.0 


4.0  6.0 


10.0 


Figure  5.  —  Distribution  of  maximum  daily  rainfall  for  September. 
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Table  3.  — Regression  constants  for  estimating  /i, 
the  mean  of  the  rainfall  pattern,  in  equation  6 


Ivlull  111 

I, 

Standard 
deviation 

January  

-L171 

1.481 

0.903 

2.317 

February  

-.515 

1.698 

.950 

2.463 

Marcli  

-L049 

1.528 

.915 

2.269 

April  

-L716 

1.524 

.925 

2.3.59 

May  

-L886 

1.581 

.927 

2.693 

June  

—  2.206 

1.524 

.923 

2.511 

July  

-  2.495 

1.595 

.938 

2.598 

August  

-  2.426 

1.584 

.926 

2.571 

September  

-  L945 

1..561 

.950 

2.878 

October  

-  L547 

1.554 

.914 

2.604 

November  

-.729 

1.745 

.949 

2.814 

December  

-.507 

1.693 

.899 

2.093 

The  procedure  for  generating  patterns  of  daily 
'  rainfall,  given  a  pattern  center  location  and  amount, 
was  first  to  generate  a  mean  and  standard  deviation 
for  the  pattern  using  equations  6  and  7.  Next,  using 
the  X-Y  coordinates  of  each  gage,  with  respect  to 
the  center  gage,  the  correlation  coefficient,  0,  for 
,  each  station  was  calculated  from  equation  5.  Start- 
ing at  the  center  gage  and  moving  outward  to  the 
next  nearest  gage  from  the  center,  rainfall  amounts 
j|  were  calculated  using  equation  4.  Rainfall  amounts 
were  calculated  at  each  gage  in  this  manner  until 
the  mean  of  the  pattern  equaled  the  generated 
pattern  mean  from  equation  6. 

Test  of  Results 

A  digital  computer  was  programmed  to  combine 
all  phases  of  the  generating  system  and  10  syn- 
thetic runs  of  9-year  length  were  made.  Using  these 
generated  data,  tests  were  made  for  the  adequacy 

I  of  each  phase  and  the  system  as  a  whole. 

A  summary  of  the  10  synthetic  runs  is  shown  in 
table  5.  The  maximum  daily  rainfall  for  each  year. 

I  the  number  of  days  of  rainfall,  and  the  mean  annual 
rainfall  for  the  168  gages  are  shown  in  this  table. 

,  The  maximum  daily  rainfall.  10.05  inches,  occurred 
in  September.  The  maxinunn  number  of  wet  days 
was  158  and  the  minitnum  was  88  days.  The  maxi- 
mum mean  annual  rainfall  was  36.86  inches  and 
tlie  minimum  was  14.76  inches. 

Two  types  of  test  were  applied  to  the  synthetic  rec- 
j'rds.  A  two-sided  /  test  was  used  for  testing  difier- 
(  lu  es  between  means  of  the  observed  and  synthetic 


records.  A  nonparametric  test  (Kolmogorov-Smirnov) 
was  used  to  test  differences  in  distributions. 

The  generation  procedures  for  the  occurrences  of 
wet  and  dry  days  were  tested  for  the  number  of  dry 
days  in  the  synthetic  record  and  the  distributions 
of  the  length  of  dry  days.  The  results  of  these  tests 
are  given  in  table  6  for  the  distributions  of  the 
length  of  dry  days  for  two  of  the  runs,  and  in  table  7 
for  the  total  number  of  dry  days.  Neither  test 
shows  a  significant  difference  at  the  0.05  level 
between  the  synthetic  and  observed  data  sets. 
The  serial  correlation  coefficients  for  rainfall 
amounts  on  wet  days  following  wet  days  during 
each  month  were  calculated.  Tests  made  on  these 
data  at  the  0.05  level  indicated  that  rainfall  amounts 
were  independent. 

Kolmogorov-Smirnov  two-sample  tests  were  made 
on  the  distribution  of  maximum  amount  for  each 
month.  An  example  of  this  test  for  June  is  shown 
in  table  8.  Results  of  tests  for  all  months  showed 
no  significant  difference  between  the  generated 
and  observed  distributions. 

To  test  the  system  as  a  whole,  two  tests  were 
made  on  the  synthetic  data.  The  first  was  a  two- 
sided  t  test  on  the  accumulated  monthly  and  annual 
means  for  the  entire  network.  The  second  was  a 
test  on  the  monthly  and  annual  records  at  each 
individual  gage  in  the  network.  The  test  statistics 
for  the  network  means  are  given  in  table  9. 

Mean  monthly  rainfall  for  the  10  synthetic  runs 
was  not  significantly  different  from  the  means  of 
the  historical  record  except  in  .\pril  and  .\ugust. 


T\BLE  'i.— Regression  constants  for  estimating  a. 
standard  deviation  of  the  rainfall  pattern,  in 
equation  7 


Month 

c 

d 

/ 

r 

Standard 
deviation 

January  

-  1.464 

0.883 

0.136 

0.989 

1.554 

February  

-1.5% 

.708 

.194 

.980 

1.46S 

March  

-  1.558 

.767 

.170 

.981 

1.429 

Vprii  

-  1.400 

.801 

.181 

.989 

l..V>9 

May  

-  1.257 

.721 

.2.50 

.>)g4 

1.792 

June  

-1.185 

.699 

•>-.> 

.993 

l.?25 

July  

-  1.1W9 

.684 

.293 

.997 

1.785 

\u};ust  

-  1.195 

.264 

.99ft 

1,748 

September  

-  1.280 

.719 

.249 

.996 

1.9ti2 

October  

-  1.388 

.709 

.231 

.989 

1.6?2 

Noyeniber  

-  1.562 

.672 

.225 

.987 

1.663 

December  

-  1.553 

.782 

.167 

.976 

1.222 
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Table  5.  —  Maximum  daily  rainfall,  number  of  wet  days,  and  mean  annual  rain- 
fall for  each  synthetic  run 


Synthetic  run  of  year  length 


iieiii 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Run  1 

Maximum  P  ^  

4.31 

4.86 

5.16 

8.36 

4.92 

4.91 

4.31 

4.60 

4.76 

Number  of  wet 

days  

137 

141 

134 

128 

127 

126 

110 

136 

113 

Mean  annual  P 

22.94 

34.52 

36.86 

31.47 

23.13 

30.23 

25.79 

24.78 

26.12 

Run  2 

Maximum  P  

4.54 

8.32 

7.56 

5.67 

7.89 

7.94 

4.28 

5.37 

6.30 

Number  of  wet 

days  

119 

135 

124 

127 

140 

109 

88 

134 

120 

Mean  annual  P  

27.91 

23.41 

25.30 

27.04 

29.67 

30.69 

22.94 

21.39 

21.57 

Run  3 

Maximum  P  

7.52 

6.29 

4.48 

8.14 

4.66 

6.47 

8.06 

4.87 

4.41 

Number  of  wet 

days  

106 

112 

120 

121 

130 

117 

106 

123 

132 

Mean  annual  P  

15.12 

24.05 

29.51 

27.38 

24.77 

24.28 

28.42 

26.10 

20.23 

Run  4 

Maximum  P  

4.91 

3.77 

6.72 

4.22 

7.46 

4.15 

5.36 

5.77 

4.85 

Number  of  wet 

days  

126 

107 

115 

107 

133 

138 

116 

127 

112 

Mean  annual  P  

31.78 

17.63 

22.15 

19.81 

27.16 

27.42 

24.87 

29.26 

15.71 

Run  5 

Maximum  P  

3.94 

5.41 

4.68 

7.33 

7.68 

6.41 

7.71 

4.64 

5.96 

Number  of  wet 

days  

126 

112 

118 

138 

143 

135 

117 

119 

126 

Mean  annual  P  

17.27 

22.83 

16.96 

30.30 

30.77 

34.10 

24.33 

18.43 

22.57 

Run  6 

Maximum  P  

3.95 

5.38 

6.33 

4.69 

6.41 

5.37 

6.86 

5.80 

6.46 

Number  of  wet 

days  

124 

107 

132 

138 

124 

158 

126 

153 

129 

Mean  annual  P  

29.85 

20.65 

30.27 

28.74 

28.75 

27.20 

27.86 

30.61 

18.90 

Run  7 

Maximum  P  

4.69 

7.50 

4.81 

4.63 

7.34 

5.76 

4.37 

6.73 

4.70 

Number  of  wet 

days  

134 

139 

125 

114 

110 

139 

115 

103 

127 

Mean  annual  P  

23.91 

33.94 

19.79 

14.76 

21.73 

28.96 

22.57 

23.27 

21.71 
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Table  5.  — Maximum  daily  rainfall,  number  of  wet  days,  and  mean  annual  rainfall 
for  each  synthetic  run  — Continued 


Synthetic 

run  of  year  length 

Item 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Run  8 

Maximum  P  

5.00 

5.10 

5.13 

8.11 

10.05 

5.39 

8.67 

8.20 

3.33 

Number  of  wet 

days  

107 

137 

120 

119 

109 

121 

126 

135 

137 

Mean  annual  P  

18.78 

21.13 

29.80 

27.78 

21.63 

24.36 

31.57 

27.76 

.30.53 

Run  9 

Maximum  P.....  

5.35 

3.17 

7.89 

4.48 

7.75 

5.09 

6.81 

5.97 

6.09 

Number  of  wet 

days  

142 

114 

130 

119 

140 

115 

138 

129 

1.36 

Mean  annual  P  

26.61 

24.59 

25.65 

21.52 

32.32 

20.57 

25.41 

30.20 

26.43 

Run  10 

Maximum  P  

4.22 

7.15 

3.17 

6.55 

6.77 

5.38 

4.93 

4.84 

6.45 

Number  of  wet 

days  

96 

117 

125 

130 

133 

128 

126 

133 

131 

Mean  annual  P  

20.29 

23.79 

21.26 

26.82 

28.57 

24.64 

24.02 

35.66 

25.08 

P=  rainfall. 


The  difference  between  the  means  of  the  generated 
and  observed  data  for  these  months  was  only  slightly 
larger  than  the  critical  difference  at  the  0.05  level. 
Despite  these  two  test  failures,  the  annual  mean 
was  not  significantly  different  from  the  observed 
annual  mean,  indicating  that  the  overall  generation 
of  the  mean  rainfall  for  the  network  area  was 
acceptable. 

The  two-sided  t  test  described  previously  was 
applied  to  the  individual  gages  of  the  network.  It  is 
not  possible  to  show  all  of  the  tests;  however,  a 
summary  of  the  tests  by  month  of  the  number  of 
gages  as  well  as  annual  mean  rainfall  are  shown  in 
table  10.  A  significant  level  of  0.05  was  used  in  all 
tests.  Approximately  71  percent  of  the  gage  indi- 
vidual records  were  accepted  as  being  representa- 
tive of  the  observed  record  on  an  annual  basis. 
Percentages  of  gages  accepted  on  a  monthly  basis 
ranged  from  a  low  of  56  percent  in  December  to  a 
high  of  97  percent  in  March. 

In  general,  the  locations  of  gages  for  which  tests 
indicated  the  synthetic  record  was  diftcrciit  from 
the  observed  record  were  clustered  in  the  north- 


west corner  of  the  network.  Figure  8  shows  the  dis- 
tribution of  these  gages  for  the  test  on  mean  annual 
rainfall.  Gages  where  monthly  records  were  different 
were  located  in  the  same  general  area. 

Discussion 

On  the  basis  of  tests  made  on  the  generating 
system,  it  appears  that  some  phases  performed 
satisfactorily,  whereas  others  will  require  turt4ier 
refinement.  The  Markov  chain  method  of  generat- 
ing the  wet  day-dry  day  sequences  for  a  large  area 
was  highly  satisfactory  and  comparable  to  results 
on  point  rainfall  obtained  by  other  investigators. 
.•\lso.  the  method  for  generating  maximum  daily 
amounts  gave  satisfactory  results.  However,  needed 
iiTiprovement  in  the  method  of  generating  patterns 
of  rainfall  is  indicated  by  the  poor  results  obtained 
in  tlie  nortliwest  section  of  the  network. 

Generation  of  mean  and  standard  deviation  ol 
daily  rainfall  patterns  using  equations  b  and  7 
appears  to  be  a  satisfactory  ntethod  of  controlling 
the  volume  of  rainfall  that  should  be  distributed  to 
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network  gages.  Results  in  table  9  indicate  that  the 
synthetic  monthly  and  annual  rainfall  for  the  net- 
work was  not  significantly  different  from  the  mean 
of  the  historical  record.  However,  the  distribution 
of  rainfall  at  individual  gages,  especially  those  in 
the  northwest  part  of  the  network  was  not  satis- 
factory. This  can  be  attributed  to  the  distribution 
model  given  by  equations  4  and  5.  The  interstation 
correlation  patterns  calculated  from  equation  5  were 
essentially  fixed  in  size  and  orientation.  It  may  be 
necessary  to  treat  the  reduction  factor,  rj,  as  a 
random  variable  because  of  the  low  amount  of  ex- 
plained variance  indicated  by  the  multiple  corre- 
lation obtained  in  fitting  equation  5  to  the  observed 


data.  Also,  it  is  possible  that  these  correlation  pat- 
terns should  vary  with  the  magnitude  of  the  maxi- 
mum storm  amount,  the  orientation  of  storm  pattern, 
and  the  season  of  the  year. 

For  example,  a  storm  pattern  for  a  day  in  October 
is  shown  in  figure  9.  The  maximum  rainfall  was  2.08 
and  the  mean  rainfall  for  the  network,  generated  by 
equation  6,  was  0.40  inch  with  a  standard  deviation 
(from  equation  7)  of  0.615.  The  calculated  mean  and 
standard  deviation  of  the  rainfall  pattern,  using  the 
individual  gage  data  generated  by  equation  4,  was 
0.40  and  0.613  inch,  respectively.  The  standard 
deviation,  mean,  interstation  correlation  pattern, 
and  orientation  all  act  to  limit  the  size  and  distribu- 
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Figure  8.  — Distribution  of  accepted  and  rejected  gages  for  annual  rainfall  records. 
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tion  of  the  pattern  rainfall.  If  a  storm  with  these 
same  characteristics  were  centered  in  the  northwest 
section  of  the  network,  a  much  different  pattern 
would  have  been  developed.  The  generation  method 
used  would  have  produced  a  pattern  elongated  to 
the  southeast. 

More  testing  than  has  been  presented  in  this 
paper  should  be  carried  out  on  the  generating 
system.  Of  course,  the  ultimate  use  of  such  a  gen- 
erating model  will  be  to  use  generated  data  as  input 


to  a  hydrologic  model  and  test  the  hydrologic 
response. 

Conclusions 

A  four-phase  system  of  generating  spatial  and 
temporal  patterns  of  rainfall  on  a  1,500-square-mile 
area  was  developed.  The  system,  which  is  con- 
sidered to  be  a  first  approximation  to  a  rainfall 
generation  model,  could  be  used  to  generate  syn- 
thetic input  data  for  a  hydrologic  model.  Tests  made 


Table  6. —  Cumulative  distributions  oj  length  of  dry  periods  for  observed  and 

synthetic  records 


Length  of  dry  period 

Observed 

Synthetic 

Maximum 

Synthetic 

Maximum 

(days) 

record 

record 

difference 

record 

difference 

Percent 

Percent 

0.589 

0.593 

0.594 

2 

.686 

.708 

2.2 

.688 

3 

.764 

.775 

.757 

4 

.805 

.816 

.825 

2.0 

5 

.845 

.847 

.854 

.875 

.880 

.883 

7 

.902 

.905 

.906 

8 

.923 

.927 

.949 

.941 

.942 

.934 

10 

.952 

.957 

.943 

11 

.962 

.968 

.955 

12 

.973 

.974 

.961 

13 

.978 

.978 

.964 

14 

.981 

.985 

.973 

15 

.986 

.987 

.977 

16 

.989 

.990 

.980 

17 

.992 

.990 

.984 

18 

.993 

.991 

.984 

19 

.995 

.993 

.987 

^20 

1.000 

1.000 

1.000 

Total 

mimhpr  of  pvent.<    

1.135 

1.152 

1.096 

M;i\iir>imi  ipn.trfh  (Hmv*^  _  

31 

28 

41 

'D».o5  =  5.7 

~  5.8 

Do.ob=  1.36/\/n,n:./ni  +  n-i. 


Table  1  .  —  Number  of  dry  days  in  the  observed  and  10  synthetic  records 


Observed 

Run 

Moan 

Differ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

cner 

2152  

2135 

2191 

2220 

2207 

2153 

2096 

2181 

2177 

2144 

2168 

2167.2 

15. 

I  Standard 
dcviaiion 
><f  mr«n 


11.6 


168 
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on  the  data  generated  by  the  system  show  that  71 
percent  of  the  synthetic  records  at  168  stations 
would  be  accepted  as  being  from  the  same  popula- 
tion of  observed  data.  A  Markov  chain  model  for 
generating  wet  day-dry  day  sequence  for  the  net- 


work  area  was   highly   satisfactory  as  was 
method  of  generating  mean  rainfall  for  the  area 
Further  refinement  of  the  method  of  generating 
rainfall  pattern  is  needed. 


Table  8.  —  Kolmogorov-Smirnov  two-sample  test  on  the  distribution  of  maximum 
amounts  during  the  month  of  June 


Rainfall  amount 

Historical 

Sample  1 

Maximum 

Sample  2 

Maximum 

(inches) 

cumulative 

cumulative 

difference 

cumulative 

difference 

distribution 

distribution 

distribution 

Percent 

Percent 

0.01  

0.008 

0.007 

0.008 

.02  

.048 

.080 

.118 

.03  

.056 

.088 

.125 

.04  

.063 

.10 

.140 

.05  

.095 

.11 

.156 

.06  

.127 

.125 

.164 

.08  

.143 

.147 

.187 

.10  

.175 

.161 

.203 

.15  

.190 

.191 

.296 

.20  

.222 

.242 

.343 

12.1 

.25  

.286 

.272 

.382 

.30  

.319 

.285 

.416 

.35  

.341 

.301 

.446 

.40  

.349 

.331 

.461 

.45  

.364 

.346 

.468 

.50  

.397 

.387 

.500 

.60  

.490 

.492 

.547 

.70  

.513 

.520 

.570 

.80  

.541 

.572 

.609 

.90  

.556 

.618 

.646 

1.00  

.587 

.676 

.664 

1  2 

.658 

753 

9.5 

.  t  Ot" 

1.4  

.686 

.756 

.774 

1.6  

.763 

.822 

.802 

1.8  

.804 

.860 

.836 

2.0  

.821 

.904 

.868 

2.5  

.882 

.928 

.900 

3.0  

.924 

.958 

.915 

3.5  

.949 

.973 

.960 

4.2  

.964 

.974 

1.000 

4.5  

.968 

.986 

5.0  

.979 

.988 

5.5  

.986 

.990 

6.0  

1.000 

.991 

7.0  

.992 

8.3  

1.000 

Number  of  observations  

126 

136 

128 

'Z)o.o5=16.8 

fo  05  =  17.08 

'  Do.o5  =  1.36/  V/iinj/rei  +  re- 
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Table  10.— Synthetic  records  accepted 


Month 


January  

February  

March  

April  

May  

June  

July  

August  

September... 

October  

November  

December  

Annua 


Jour.  Gf-ophys. 


Gages  accepted 


120 


M 

r  crccTlt 

1  1  Q 

(  1 

111 
111 

oo 

y  / 

125 

74 

118 

70 

125 

74 

154 

92 

122 

72 

119 

71 

159 

95 

119 

71 

94 

56 

71 
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COMMENTS  ON  THE  STATISTICAL  DISTRIBUTION  OF  RAINFALL  PER 
PERIOD  UNDER  VARIOUS  TRANSFORMATIONS ' 

By  P.  M.  Skees  and  L.  R.  Shenton  ^ 


Abstract 

The  statistical  distribution  of  rainfall  amounts 
over  relatively  long  periods  (months,  years)  has  re- 
ceived considerable  attention  in  the  literature; 
generally,  satisfactory  fits  are  achieved  for  selected 
locations  and  seasons  by  using  well  known  models 
such  as  the  gamma,  logarithmic  normal,  and  normal 
distributions.  For  moderate  to  small  intervals  of 
observations  (weeks,  days,  hours)  satisfactory  dis- 
tributional models  are  much  more  difficult  to  find, 
although  more  observations  are  available. 

Considerable  skewness  in  rainfall  distributions 
arises  because  of  the  relatively  numerous  occur- 
rences of  "no-rain"  days;  if  these  are  omitted,  one 
source  of  heterogeneity  is  removed.  Amounts  of 
rainfall  on  "wet"  days  can  still,  over  short  periods, 
be  very  skewed,  and  we  consider  over  100  trans- 
formations designed  to  eliminate  this  lack  of  sym- 
metry. About  30  locations  in  the  Southeastern 
States  are  studied,  periods  of  observations  being 
days,  hours,  and  "storms"  periods.  In  most  cases, 
the  total  number  of  observations  involved  were  those 
included  in  10-year  records  (1955-64).  The  models 
tried  were  the  gamma,  censored  gamma,  and  gen- 
eralized gamma  distributions.  Since  very  consider- 
able data  analysis  is  involved,  we  used  simple 
criteria  for  accepting  or  rejecting  models.  For  ex- 
ample, we  have  used  moment  methods  of  estimation, 
and  sample  values  of  the  skewness  and  kurtosis  to 
judge  departures  from  the  gamma  and  normal 
distributions. 

Each  location  has  a  best  transformation  for  pro- 
ducing near  normality,  or  approximate  gamma 
shape,  for  rainfall  amounts  over  days  or  hours. 
Root  transformations  ( V^,  where  x  is  the  amount 
per  period)  are  often  useful  in  reducing  skewness. 


'  Research  sponsored  by  the  Southeastern  Experiment  Station 
with  U.S.  Department  of  Defense  Funds,  I  PR- 19-8-8001. 
^  Computer  Center,  University  of  Georgia,  Athens. 


but  they  are  only  partly  effective  in  reducing 
kurtosis. 

Introduction 

The  statistical  distribution  of  amounts  of  rainfall 
over  months  or  years  has  received  considerable 
attention  (see,  for  example.  Manning  (28),  (29), 
Gupta  {17),  Markovic  {30),  Mooley  and  Crutcher 
(32),  Thom  {43,  44,  45),  Friedman  {15),  Kotz  and 
Neumann  (25),  and  Amorocho  and  Brandstetter 
(2 ) ) ,  but  for  shorter  periods,  such  as  a  day,  an  hour, 
or  even  less,  httle  has  been  pubUshed  (see  Das  (9), 
Neyman  and  Scott  {33)).  Reasonably  good  fits  of 
observed  data  over  months  (or  years)  have  been 
achieved  under  the  assumption  that  the  theo- 
retical distribution  is  a  gamma  (involving  two  or 
three  parameters)  distribution;  however,  no  single 
distribution  is  acceptable  over  widely  differing  rain- 
fall periods. 

When  the  distribution  of  amounts  of  rainfall 
over  shorter  intervals  of  time  (1  hour,  for  example) 
is  considered,  it  is  much  more  difficult  to  find  ac- 
ceptable theoretical  distributions.  One  reason  for 
this  is  variability.  If  one  examines  figure  1,  which 
compares  the  monthly  rainfall  of  New  York  City 
and  Atlanta,  Ga.,  over  a  5-year  period,  one  can 
readily  detect  the  variability  of  monthly  rainfall 
as  well  as  the  fact  that  the  deviation  is  greater  than 
the  mean  for  both  hourly  and  daily  precipitation. 
Table  1  shows  the  mean,  standard  deviation, 
skewness  {Vp[=  /Xslfil^^) ,  kurtosis  {^2  =  ixj fjL\) , 
and  Pearson  Type  III  or  gamma  test  (2/32  — 3)8i  =  6 
exactly  for  these  distributions)  for  several  weather 
stations  in  the  southeastern  United  States.  The 
parameter  values  are  further  dichotomized  into 
hourly  (positive  and  total)  and  daily  (positive  and 
total)  precipitation  amounts.  Examination  of  the 
means  and  standard  deviations  shows  that  hourly 
and  daily  precipitation  amounts  are  quite  variable. 
In  addition,  when  we  compare  the  skewness  and 
kurtosis   to   the  skewness  and  kurtosis  for  the 
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normal  distribution  (0.0  and  3.0,  respectively), 
we  see  that  rainfall  is  highly  skewed.  Notice  that 
for  hourly  rainfall  at  Greensboro  (all  hours  included) 
the  skewness  /3i  is  nearly  300,  which  is  statistically 
large.  Even  when  dry  hours  are  eliminated,  the 
skewness  is  still  around  25,  which  again  is  sta- 
tistically large.  This  property  of  rainfall  severely 
limits  the  range  of  statistical  distributions  which  are 
applicable.  Another  problem  associated  with  the 
analysis  of  short-period  precipitation  is  that  of  zero 
and  trace  rainfall  which  effectively  involves  censor- 
ing, that  is,  we  know  how  many  times  an  event 
occurs  but  are  unable  to  measure  the  event  with  a 
sampling  device  (in  this  case,  the  rain  gage). 

Essenwanger  {10)  and  Das  (9)  have  been  aware 
of  the  effect  of  censoring  due  to  the  limitations  of 
the  rain  gage.  Essenwanger  proposed  that  one 
should  use  positive  precipitation  to  determine  the 
true  frequency  of  zero  rainfall.  He  also  suggested 
that  precipitation  amounts  consist  of  a  sum  conglom- 
erate of  log-normally  distributed  quantities,  arising 
from  rainfall  components  belonging  to  different 
weather  collectives.  From  this  point  of  view,  one 
might  fit  a  statistical  mixture  of  three  lognormal 
distributions  to  rainfall  amounts,  corresponding  to 
light,  moderate,  and  heavy  intensities  of  the  storms. 
Das,  on  the  other  hand,  applied  both  the  censored 
and  truncated  gamma  distributions  to  daily  rainfall 
in  Australia  with  reasonably  good  results. 

Another  approach  to  the  distribution  of  precipi- 
tation has  been  the  use  of  transformations  to  obtain 
normally  distributed  data.  Stidd  {40, 41,42)  proposed 
a  system  that  determines  root  transformations, 
with  emphasis  on  the  cube  root  to  obtain  normalized 
data.  Kendall  {23)  considered  the  approximate 
normality  of  the  cube  root  of  monthly  rainfall  (June, 
July,  August)  at  10  Canadian  stations  for  a  30-year 
period  (1921-50),  the  fits  being  generally  accept- 
able. He  also  discussed  the  problem  of  associated 
confidence  limits  for  monthly  rainfall  amounts. 
Essenwanger  {10)  made  use  of  the  natural  logarith- 
mic transformation  along  with  the  concept  that  zero 
and  trace  rainfall  amounts  are  determined  from  the 
results  of  the  transformation  rather  than  included 
or  totally  excluded  from  the  application  of  the 
transformation. 

The  present  paper  (1)  reports  findings  of  a  study 
of  the  usefulness,  from  a  distributional  point  of 
view,  of  transformations  of  rainfall  amounts  for 


short  periods  (hours);  (2)  to  comment  on  applica- 
tions of  the  censored  gamma  distribution  (as  con- 
sidered by  Das);  (3)  to  give  examples  of  fits  of  the 
generalized  gamma  distribution;  and  (4)  to  comment 
on  the  relation  of  statistical  parameters  for  periods 
of  time  divided  into  smaller  intervals  of  time;  for 
example,  what  information  about  the  variability  of 
annual  rainfall  can  be  deduced  from  the  expression 
of  this  rainfall  as  the  sum  of  rainfall  over  rainy 
days,  the  number  of  rainy  days  per  period  being  a 
random  quantity. 

Previously,  Shenton  and  Skees  {34)  reported  a 
shorter  study  of  the  effectiveness  of  transforma- 
tions applied  to  rainfall  amounts  for  fixed  periods 
and  over  storms,  along  with  a  reference  to  the  dis- 
tribution of  storm  durations  for  hourly  observations. 

Remarks  on  the  Distributions 
Considered 

The  Gamma  Distribution 

This  distribution  density 

f{x;  a,  p,  s)  =  {{x-s)la)^-' 

exp[(5-;c)/a]/[a  r(p)]  (a,  p  >  0,  5  real)  (D 

for  x>  s,  and  zero  density  for  x  <  s. 
Moments: 

Mean=  ap+  s 

Variance: 

cr-  =  a-p  (2) 

Maximum  likelihood  estimators  p  of  p,  a  of  a., 
when  s  =  0 

ap  =  m;,  (3) 

In  p-i//(p)  =  In  (m;/g), 

where  m[,  g  are  the  sample  arithmetic  and  geo- 
metric means,  respectively,  and  <//  is  the  psi-function. 
(Properties  of  these  estimators  have  been  studied 
in  detail  in  Bowman  and  Shenton  (5)  and  Shenton 
and  Bowman  {36).) 

For  most  practical  purposes,  approximate  maxi- 
mum likelihood  estimators  introduced  by  Thom 
{44)  are  quite  adequate.  Thom's  estimators  are 
(with  y=ln  {m[lg)) 
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Table  I.  — Moments  of  hourly  and  daily  rainfall  for  stations  in  the  Southeast  (1955-64) 


Mean 

S.D. 

Skewness 

Kurtosis 

Type  III 

Station 

Period  ' 

Type  - 

(  ^/3i ) 

criterion 

(cr) 

[/.pz    opi  o) 

Greensboro,  N.C  

H 

A 

0.0050 

0.0369 

17.57 

483.34 

34.8 

D 

A 

.1183 

.3312 

5.12 

40.45 

-3.8 

H 

P 

.0733 

.1225 

5.21 

44.81 

2.3 

D 

P 

.3666 

.4989 

3.06 

16.70 

-.6 

Charleston,  S.C  

H 

A 

.0066 

.0558 

19.71 

603.36 

35.2 

D 

A 

.1580 

.4466 

4.85 

34.42 

-7.9 

H 

P 

.1071 

.1999 

5.35 

47.42 

2.8 

D 

P 

.5018 

.6788 

2.77 

13.26 

-2.5 

Valdosta,  Ga  

H 

A 

.0060 

.0600 

27.71 

1322.15 

334.4 

D 

A 

.1433 

.4398 

5.53 

44.94 

-7.9 

H 

P 

.1202 

.2422 

6.95 

85.56 

20.3 

D 

P 

.5032 

.7056 

3.06 

15.93 

-2.2 

Savannah, Ga  

H 

A 

.0062 

.0526 

16.99 

402.52 

-67.4 

D 

A 

.1483 

.4217 

4.60 

29.61 

-10.4 

H 

P 

.1098 

.1941 

4.24 

27.58 

-4.7 

D 

P 

.4847 

.6467 

2.50 

10.60 

-3.5 

H 

A 

.0054 

.0396 

16.71 

431.98 

20.7 

D 

A 

.1291 

.3536 

4.96 

38.75 

-2.4 

H 

P 

.0833 

.1331 

4.83 

38.67 

1.4 

D 

P 

.4010 

.5285 

2.97 

16.35 

.2 

H 

A 

.0048 

.0327 

18.90 

686.10 

294.3 

D 

A 

.1140 

.2745 

4.04 

25.95 

-3.1 

H 

P 

.0686 

.1049 

6.19 

75.91 

30.7 

D 

P 

.3201 

.3814 

2.50 

12.28 

-.1 

Roanoke,  V  a  

H 

A 

.0042 

.0285 

16.25 

461.33 

124.2 

D 

A 

.1016 

.2704 

4.45 

29.90 

-5.6 

H 

P 

.0614 

.0908 

5.13 

49.17 

13.5 

D 

P 

.3073 

.3974 

2.58 

12.35 

-1.3 

Knoxville,  Tenn  

H 

A 

.0054 

.0356 

15.14 

393.50 

93.4 

D 

A 

.1301 

.3308 

4.42 

30.53 

-3.6 

H 

P 

.0745 

.1105 

4.80 

43.37 

11.5 

D 

P 

.3697 

.4717 

2.72 

13.84 

-.5 

H 

A 

.0053 

.0437 

21.55 

807.66 

216.3 

D 

A 

.1279 

.3661 

4.72 

31.52 

-9.7 

H 

P 

.0885 

.1560 

6.11 

67.67 

17.4 

D 

P 

.4203 

.5636 

2.58 

11.40 

-3.2 

H 

A 

.0059 

.0515 

18.29 

502.51 

-5.0 

D 

A 

.1407 

.4086 

5.26 

43.32 

-2.3 

H 

P 

.1121 

.1970 

4.48 

33.41 

T 
.  1 

D 

P 

.4532 

.6292 

J. 05 

r 
.0 

Montgomery,  Ala  

H 

A 

.0059 

.0512 

20.97 

779.54 

234.0 

D 

A 

.1418 

.4420 

5.90 

53.01 

-4.3 

H 

P 

.1093 

.1929 

5.49 

57.98 

19.4 

D 

P 

.4793 

.7063 

3.36 

19.64 

—  .7 

'  H,  hourly:  D.  daily. 

•  A,  all  observations;  P,  nonzero  or  positive  observations. 
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a*p*  =  n\[. 


(4) 


or 


3p^ 


1+  V(l  +  4y/3) 
4y 

1 

V(l  +  4y/3)-l 


and  distributional  properties  are  given  in  (36). 

An  alternative  approximation  to  the  maximum 
likelihood  estimator  was  given  by  Greenwood  and 
Durand  (16)  in  the  form  of  a  rational  fraction  in  y. 

Greenwood  and  Durand's  estimators  are: 

a'p'  =  m\ 

p '  =  (0.5000876  +  0. 1648852y  -  0.0544274y^  )/y 
(maximum  error  =0.0088  percent) 

0^y«  0.5772 

,  _  8.898919  +  9.059950y+  0.97753 73r^ 
^  ~  y(17.79728+11.968677y+y^) 


plil^^xjIN, 


(6) 


In  A+ndn  8  -  l)/p)/7V=  i//(p)  -  (||;  \n  x)j  j N 

where  the  number  of  observations  in  a  sample  of 
falling  in  (0,  8)  is  n,  and  the  remaining  observa- 
tions Xi,  Xt,  .  .  .,  Xm^  8.  These  equations  assume 
the  validity  of  the  approximation  (8  small  >0) 


(exp  (—  iJix)  )xP-^dx  =  d^lp. 


A  Thom-type  approximation  may  be  constructed 
for  the  censored  gamma  distribution  as  follows: 


1-2^+  ((l-2g)^  +  4f/3)'/-^ 
4« 


(7) 


where 


t=ln  X  y  In  :«j  —  ^  In  8, 


(maximum  error  =  0.0054  percent)  1 


0.5772  ^y^  17. 


x  =  —  V  X 


If  the  data  to  be  fitted  were  ^-shaped  and  may 
even  contain  zero  values  of  the  variate  (as  is  the  case 
with  hourly  amounts  of  rainfall  for  regions  involving 
prolonged  dry  spells),  then  a  truncated  or  censored 
version  of  the  distribution  should  be  used. 


The  Censored  Gamma  Distribution 

This  distribution  arises  from  the  distribution  in 
equation  1  (with  5  =  0)  truncated  at  x=8,  where 
8  >  0  is  small.  Das  (9)  suggests  fitting  this  curve 
with  8  known,  but  making  use  of  the  number  of 
observations  in  the  interval  (0,  8).  The  estimators 
he  uses  for  the  density 

(f)ix,  IX,  p)  —  p,PxP~'^  exp  (— p,x)/r(p) 

using  the  maximum  Hkehhood  principle,  are  solu- 
tions of  the  equations: 


d=nlN. 

The  corresponding  estimator  of  /ir  is  derived  from 
the  first  equation  in  (6).  Remarks  on  various  as- 
pects of  censored  and  truncated  gamma  distri- 
butions may  be  found  in  Cohen  (6)  and  Johnson 
and  Kotz  {21). 

The  Generalized  Gamma  Distribution 

The  density  of  the  distribution  is 

f{x;  +  a,  k,  b)  =  ibir{k))a-'>'cxt"'-^  exp  [- {x/a)"] 

(8) 

:c  >0 
a,  6,  A:  >  0 
The  first  two  moments  are: 
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(9) 


VHk-\-- 


Vik) 


Maximum  likelihood  estimators  a,  b,  k  are  ad- 
missible (in  the  sense  of  an  acceptable  physical 
interpretation)  solutions  of  the  equations 


-Nk+'^  U/a)*=o 


(10) 


NIb  +  k  2  In  (xild)-^  (xild)"  In  (Wa)=0 

1=1  i=l 

-il){k)  +  b^  In  U,/a)  =  0, 
1=1 

based  on  a  random  sample  {xi,  X2,  .  .  .,  ;c.v). 

Estimation  is  accomplished  through  a  search  of 
b  in 


-i}j{k)  +  b[^\nxij/  N-h,[J^x^j  +  \n  nk  =  0 

(11a) 


wnere 


k^lb 


J^lnx,      N-iJ^xl'ln  x.  xf 


as  given  by  Hager  and  Bain  {18).  The  question  of 
the  existence  of  solutions  is  not  without  difficulties. 
Remarks  on  the  distribution  may  be  found  in 
papers  of  Cohen  (7)  and  Johnson  and  Kotz  (27 ). 


Other  Distributions 

Creneral  distributions.  —  A  few  distributions  ap- 
plicable to  rainfall  amounts  have  appeared  in 
hydrology  journals.  These  distributions  are  variants 
of  well-known  cases  in  statistical  literature.  Fisher's 
(II)  distribution  and  Slade's  (.'^'>)  distribution  are 
versions  of  the  lognormal  distribution.  For  rainfall 
over  arid  regions.  Fisher  (13)  suggests  that  the 


total  rainfall  over  a  given  month  or  year  arises  from 
a  random  number  of  showers,  the  number  being 
distributed  as  a  Poisson  variate  with  mean  m.  There 
is  now  a  nonzero  probability  of  no  rain,  namely 
exp(— m).  The  moment  generating  function  (param- 
eter 0  of  the  variate,  assuming  that  the  rainfall  for 
a  single  shower  has  density  exp(—  xla)dxla,  is 


me' 


1 


+ 


1 


1!     (1  +  at)        21     (\  +  at) 


+  . 


which  leads  to  the  density 


(t>(x;  m,      =  exp  (—m  —  xla)Ii{y), 


x>0, 

with  a  point  mass  exp  (—m)  at  the  origin,  where 
y=2  V(mx/a),  and  /i(y)  is  a  Bessel  function  with 
imaginary  argument.  The  distribution  has  not  been 
widely  applied  because  of  its  inherent  complexity. 

Transformed  distributions. —  "Sew  distributions 
can  be  derived  by  carrying  out  a  transformation  of 
the  variate.  An  objective  here  is  to  induce  normality. 
Johnson  (20)  has  studied  this  aspect  in  detail.  The 
transformations  he  uses  are 


Z  =  y+  8  In  X, 

Z  =  y+8  In  (xl(\-x)), 

Z  =  y+  8  sin  h'^x 

where  x  =  (X  —  ^)lk  and  X  ^  ^.  In  general,  no  single 
transformation  is  adequate  to  induce  normality, 
but  combinations  of  them  can  be  quite  eflective. 
although  rather  complicated. 


The  Censored  (iaiiinia  Distribution 

A  problem  that  is  encountered  witli  some  rainfall 
regimes  (in  particular  the  southeastern  Slates)  is 
the  relative  abundance  of  drv  davs  or  drv  hours. 
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Moreover,  in  such  circumstances  the  occurrence 
of  zero  rainfall  is  not  instrumentally  readily  dis- 
tinguishable from  the  occurrence  of  small  amounts 
or  traces  of  rainfall.  However,  if  a  gamma  dis- 
tribution is  hypothesized,  then  experience  (see  for 
example  Thom  (44),  Mooley  and  Crutcher  (32))  in- 
dicates the  shape  parameter  p  (see  equation  1)  may 
be  fractional,  so  that  the  curve  of  distribution  is 
/-shaped.  Evidently  then,  the  usual  procedures  for 
estimating  p  will  be  greatly  influenced  by  the  fre- 
quency of  small  variate  values,  which  in  the  present 
situation  are  least  reliable.  To  overcome  this  type 
of  problem,  the  censoring  approach  introduced  by 
Das  (9)  suggests  itself.  In  the  southeastern  United 
States,  13  weather  stations  reporting  hourly  obser- 
vations were  fitted  with  the  censored  gamma  dis- 
tribution using  the  truncation  value  8=0.01  and  the 
method  of  maximum  likeUhood.  The  analysis  was 
carried  through  utilizing  the  conversational  pro- 
gramming system  (CPS)  on  the  University  of  Georgia 
IBM  360/65  computer.  The  chi-square  and 
Kolmogorov-Smirnov  goodness-of-fit  tests  were 
applied.  A  summary  of  results  is  given  in  table  2 
and  a  detailed  example  is  shown  in  table  3.  The 
fits  are  reasonably  satisfactory,  the  best  being  that 
for  Charleston,  whereas  the  case  of  Nashville  leaves 
much  to  be  desired.  Table  3  shows  details  of  the 
estimators  and  fitted  distribution  for  positive  daily 


rainfall  for  Charleston.  Table  4  shows  the  distribu- 
tion details  for  hourly  rainfall  at  Charleston  with 
a  rather  poor  fit  as  indicated  by  the  large  value  of  X'- 
The  evidence  provided  by  the  data  suggests  that 
the  censored  gamma  distribution  may  yield  satis- 
factory fits  for  positive  daily  rainfall  amounts  in 
situations  where  dry  days  are  fairly  common. 

Transformed  Rainfall  Amounts 

Attempts  at  discovering  a  single  statistical  dis- 
tribution which  would  fit  rainfall  regimes  of  all 
types  have  not  been  completely  successful,  and 
the  situation  deteriorates  rapidly  as  smaller  obser- 
vational periods  are  considered.  This  is  doubtless 
due  to  the  complex  physical  events  which  are  a 
hidden  part  of  the  aggregate  we  label  rainfall.  It 
would  indeed  be  remarkable  if  a  mere  statistical 
descriptive  could  be  invoked  for  the  worldwide  real- 
izations of  rainfall  amounts  over  different  periods. 
In  this  connection,  one  may  recall  a  remark  of 
Bergeron  (4),  namely,  "I  have  always  been  looking 
for  maps  (weather)  containing  what  I  caU  a  one- 
factor  rain,  and  it  has  always  been  very  difficult  to 
find  typical  one-factor  rains  from  precipitation 
measurements." 

Various  authors  have  made  massive  studies  of 


Table  2.— Application  of  the  censored  gamma  distribution  to  positive  daily  rainfall  for  stations  in  the 

Southeast  {1955-64) 

[u  and  6^  are  the  mean  and  variance,  respectively,  of  rainfall;  a  is  the  maximum  likelihood  estimator  of  a;  (p*,  pr,  p  are  the  moment, 
Thom-type,  and  maximum  likelihood  estimators  of  p,  respectively.  A'=  sample  size,  (i/=  degrees  of  freedom,  xl  is  the  significance  level 
of  the  observed  chi-squared  value.] 


Station 

A 

P* 

Pt 

P 

a 

A' 

df 

Montgomery,  Ala  

0.37237 

0.19216 

0.72158 

0.60918 

0.58968 

1.58359 

1035 

16 

30.88 

0.99 

Macon,  Ga  

.34002 

.15444 

.74861 

.66928 

.65217 

1.91803 

1077 

14 

21.91 

.95 

Jacksonville,  Fla  

.36708 

.17987 

.74916 

.64344 

.62532 

1.70348 

1093 

18 

32.80 

.99 

KnoxviUe,  Tenn  

.32882 

.14043 

.76993 

.69110 

.67459 

2.05152 

1270 

15 

32.47 

.995 

Roanoke,  Va  

.28697 

.12427 

.66271 

.65798 

.64036 

2.23143 

1212 

11 

15.42 

.90 

Florence,  S.C  

.34538 

.15590 

.76517 

.69590 

.67966 

1.96787 

1091 

17 

35.25 

.995 

Bristol,  Tenn  

.30201 

.11844 

.77012 

.76344 

.74891 

2.47972 

1310 

11 

20.51 

.975 

Nashville,  Tenn  

.34715 

.15480 

.77851 

.67585 

.65896 

1.89819 

1151 

18 

37.71 

.999 

Elkins,  W.  Va  

.25172 

.09005 

.70367 

.72943 

.71383 

2.83578 

1584 

16 

21.23 

.90 

.38183 

.18312 

.79618 

.77865 

.76496 

2.00339 

994 

16 

23.59 

.95 

Savannah,  Ga  

.37653 

.17840 

.79470 

.67398 

.65699 

1.74486 

1066 

16 

26.33 

.975 

Charleston,  S.C  

.39189 

.19547 

.78570 

.72015 

.70477 

1.79837 

1097 

17 

19.93 

.80 

Greensboro,  N.C  

.31789 

.13992 

.72225 

.62587 

.60667 

1.90843 

1159 

16 

23.13 

.90 
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rainfall,  doubtless  hoping  to  find  a  unifying  model 
which  would  typify  and  simplify  rainfall  observa- 
tions. Thus  Markovic  studied  annual  rainfall  and 
river  flow  at  some  2.506  stations,  using  the  normal, 
log-normal,  gamma  (with  various  parameter  forms) 
distributions,  and  maximum  likelihood  estimators. 
No  single  distribution  had  general  applicability; 
in  some  regimes,  all  the  distributions  were  satis- 
factory, whereas  in  others  one  or  another  distribu- 
tion was  superior. 

When  the  period  of  observation  is  smaller 
(months  or  weeks),  the  quest  for  a  simple,  generally 
applicable  model  becomes  much  more  difficult, 
although  monthly  rainfall  can  often  be  satisfactorily 
fitted  by  the  gamma  distribution.  (See  Thom  (44), 
Mooley  and  Crutcher  {32),  for  example.)  Our  own 
work  (Shenton  and  Skees  (37,  38)  and  other  unpub- 
lished data)  indicates  that  daily  and  hourly  amounts 
are  rarely  satisfactorily  accounted  for  by  the  gamma 
distribution.  A  glance  at  table  1  (as  mentioned 
earlier)  shows  the  extreme  skewness  and  kurtosis 
(as  measured  by  Vy87,  ftz;  for  the  normal  distri- 
tion  these  are  0  and  3,  respectively)  and  the  depar- 
ture from  the  gamma  distribution  (a  chart  of  the 
values  of  V^,  f3>  for  the  cube  root  of  daily  rain- 
fall for  Georgia  stations  is  given  on  page  92  of  (37)) 
for  hourly  and  daily  observations  at  selected 
stations. 

A  typical  rainfall  distribution  is  likely  to  have  a 
long  tail  (corresponding  to  a  paucity  of  occasions 
upon  which  large  amounts  of  rain  fall)  and  perhaps 
a  low  mean.  Root  transformations  should  therefore 
help  to  adjust  the  skewness.  The  cube  root  (Stidd 
(40),  Kendall  (23))  has  acquired  a  charisma  which 
is  not  entirely  justified.  Of  course  it  is  well  known 
from  the  Wilson-Hilferty  transformation  (see  (.57)) 
that  the  cube  root  of  a  gamma  variate  is  approxi- 
mately normally  distributed.  (There  are  also  refine- 
ments of  this  transformation.)  The  transformation 
x".  a  >  0.  has  been  studied  by  us  for  a—  1/10  (1/10) 
9/10  as  well  as  the  limiting  form  In  x.  The  latter 
takes  the  nonnegative  reals  into  the  reals,  so  that 
it  may  drastically  reduce  skewness;  in  fact,  it  may 
overcorrect  for  skewness  (see  Manning  (20))  in 
which  case  the  transformation  In  (.r  +  r)  with  c  >0 
may  be  tried.  Unfortunately,  there  is  no  simple 
tractable  method  of  determining  an  optimum  < . 

Considering  the  extreme  skewness  and  kurtosis 


Table    3.  — Daily   positive    rainfall   amounts  for 
Charleston,  S.C.,  fitted  by  censored  gamma 


X 

Observed 

Expected 

values 

0  OO-f)  0=^ 

91  ^  fATK 

1  41 

(1  ()Q_n  1  (\ 

190 

1  99  99^9 

fid 

(1  ]  (\—(\  1  ^ 

QQ 
Oo 

ftfi 

0  1  c:_n  90 

77 

1  0.7000 

.\JO 

9  in 

n  90—0  9^ 

Ol 

00. 7\)04 

.Do 

9 

n  9'^_fl 

oo 

-J  t  .  IXJO" 

dO 

n  ^0—0 

cc 

DO 

cf)  1  coo 

d7 

*»o.  7^ZZ 

no 

n  dft— fl  ds 

9R 

00.  /uzo 

9 

9"^ 

OA  9909 

9  dO 
2.49 

n  '^o-fl 

97 

10  1719 

OKJ.O  i  OZ 

17 
,0 1 

dl 

99 

27  0244 

91 

.  ~o 

10.35 

90 

9d  noon 

1.00 

1 1  Id. 

0.65-0.70  

21 

21.5312 

.01 

11.36 

0.70-0.75  

26 

19.2684 

2.35 

13.71 

0.75-0.80  

16 

17.2677 

.09 

13.80 

0.80-0.85  

14 

15.4939 

.14 

13.95 

0.85-0.90  

15 

13.9174 

.08 

14.03 

0.90-0.95  

12 

12.5135 

.02 

14.05 

0.95-2.00  

127 

102.4599 

5.88 

19.93 

'      is  the  contribution  to  x-square: 
Mean  =  0.3919       Var.  =0.1955  .V=1097 
p*  =  0.7857      p  (Thom  =  0.7201) 
p  =  0.7048       i=  1.7984 


of  rainfall  amounts  per  day  (hour),  even  conditional 
upon  the  observed  hour  being  rainy,  it  is  not  sur- 
prising that  effective  simple  transformations  are 
difficult  to  find.  In  addition,  we  are  looking  for  a 
transformation  formula  which  can  be  used  to  set  up 
confidence  bounds  or  to  compare  rainfall  amounts 
at  two  neighboring  locations,  as  is  the  case  in  many 
weather  modification  experiments.  .-K  point-by-point 
transformation  to  a  specified  probability  integral 
(X-square.  normal,  for  example)  is  not  the  goal. 

There  are  two  possibilities,  or  so  it  seems,  ^e 
can  aim  at  a  transformation  of  rainfall  amount 
x(  >  0)  which  approximately  induces  a  gamma  dis- 
tribution or  which  produces  ideally  a  normal  dis- 
dribution.  In  both  cases,  we  must  remember  that 
the  given  data  is  merely  a  sample  and  is  therefore 
subject  to  various  kinds  of  error.  In  addition,  the 
length  of  time  of  recorded  observations  is  not  a 
disposable  parameter;  very  long  weather  records 
can  become  suspect  because  of  changes  of  location 
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Table  4. —Fit  of  censored  gamma  distribution  to  Greensboro,  N.C.,  total  hourly  rainfall 


X 

Observed 

Expected 
values 

P(Obs) 

0 

0.00-0.02  

83534 

83540 

0.001 

0.00 

0.9523 

0.9524 

0.0001 

0.02-0.05  

1779 

1815 

.720 

.72 

.9726 

.9730 

.0005 

0.05-0.08  

779 

798 

.461 

1.18 

.9814 

.9821 

.0007 

0.08-0.11  

511 

462 

5.301 

6.48 

.9873 

.9874 

.0001 

0.11-6.14  

280 

298 

1.044 

7.53 

.9905 

.9908 

.0003 

0.14-0.17  

193 

204 

.569 

8.10 

.9927 

.9931 

.0005 

0.17-0.20  

130 

145 

1.536 

9.63 

.9941 

.9948 

.0006 

0.20-0.23  

101 

106 

.220 

9.85 

.9953 

.9960 

.0007 

0.23-0.26  

62 

79 

3.576 

13.43 

.9960 

.9969 

.0009 

0.26-0.29  

67 

60 

.937 

14.36 

.9968 

.9976 

.0008 

0.29-0.32  

55 

46 

1.977 

16.34 

.9974 

.9981 

.0007 

0.32-0.35  

42 

35 

1.343 

17.68 

.9979 

.9985 

.0006 

0.35-0.38  

27 

27 

.004 

17.69 

.9982 

.9988 

.0006 

0.38-0.41  

24 

21 

.315 

18.00 

.9984 

.9990 

.0006 

0.41-0.44  

18 

17 

.077 

18.08 

.9987 

.9992 

.0006 

0.44-0.47  

14 

13 

.032 

18.11 

.9988 

.9994 

.0006 

0.47-2.00  

104 

55 

44.622 

62.73 

1.0000 

1.0000 

0 

Af=  87720  Mean  =  0.0050  0-2  =  0.0014 

6  =  0.02  a  =  5.583  ^  =  0.028 

XM14)=38  xMcomputed)  =63 


or  even  a  change  of  long-range  weather  patterns. 
So  the  data  is  neither  infinite  in  extent  nor  error-free. 
We  assume,  however,  that  for  practical  purposes 
the  sample  size  is  large  enough  to  experiment  with 
root  transformations  designed  to  reduce  the  skew- 
ness  and  kurtosis.  An  alternative  would  be  to  hypo- 
thesize a  distributional  model,  use  some  appealing 
and  efficient  procedure  for  estimating  the  param- 
eters with  an  acceptance  level  for  some  goodness-of- 
fit  test,  and  carry  out  a  root  transformation  on  it; 
however,  this  technique  would  be  uneconomical  and 
time  consuming  even  on  a  moderate-sized  digital 
computer. 

Some  root  transformations  run  on  positive  hourly 
rainfall  are  shown  in  figures  2  and  3.  The  effects  of 
the  pencil  of  transformations  on  a  gamma  variate 
are  shown  in  figure  4.  Transformations  apphed  to 
precipitation  are  given  in  the  following  tabulation: 


1.  ln(l+:r)' 

2.  In  (2-l-;t) 

3.  (ln(l+x))2 

4.  In  (H-x^) 

5.  (l+;t)"2-3c'/2 

6.  (l+;t)"''-x"< 


7.  (\+x)^l*-x^l* 
1 

1+x 
1 


8. 


10.  (2-l-x)'/2 

11.  (l  +  >/ia:)"=' 

12.  (l  +  i/ia:)»/3 

13.  (l  +  VixV* 

14.  (l  +  '/3x)"2 

15.  (l  +  V'ax)"" 

16.  \n(\  +  ^/ix) 

17.  (l  +  »4x)"2 

18.  (1  +  V4x)"3 

19.  (\  +  Vax)^I* 

20.  (l  +  »/4x)"5 

21.  (1-I-V4x)"6 

22.  d  +  Vixyi-' 

23.  (\n  (l+x))"-^ 

24.  (In  (H-x))'>-2 

25.  (In  (\+x))o-^ 

26.  On  (l  +  x))"" 

27.  (In  (l+x))»-5 

28.  (In  (l+x))»« 

29.  (ln(l+;t))»' 

30.  (In  (l+x))"* 

31.  an(l+x))»» 

32.  (In  (x))^ 

33.  On  (x))^ 

34.  On  (x))* 

35.  In  {x) 

36.  xo' 

37.  x"-^ 


38. 

XO.3 

39. 

x"* 

40. 

41. 

42. 

43. 

44. 

45. 

46. 

sinh  (x) 

47. 

cosh  (x) 

48. 

Atan  (x) 

49. 

1 

cosh  (x) 

50. 

Tanh  (x) 

51. 

x"'  —  (max 

(x) 

-x)"' 

52. 

j,o.2  _  (max 

ix) 

-x)»-2 

53. 

^0.3  _  (max 

{x) 

-x)»-3 

54. 

x"*—  (max 

(x) 

-x)"" 

55. 

jjo.s—  (max 

ix) 

-x)»s 

56. 

x"*—  (max 

{x) 

-x)»« 

57. 

x"'—  (max 

(x) 

-x)"-' 

58. 

x"*—  (max 

(x) 

-x)»-8 

59. 

j,o.9_  (max 

ix) 

-x)"-* 

60. 

x'"—  (max 

(x) 

-x)'« 

61. 

x'-'  —  (max 

(x) 

-x)>' 

62. 

x'-^  —  (max 

{x) 

-x)'-2 

63. 

x'-'—  (max 

ix) 

-x)'-3 

64. 

x'-''—  (max 

ix) 

-x)'- 
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65.  x"- (max  (x) 69.    x' »  -  (max  (x) -x) '  » 

66.  (max  (Jt) 70.    x^"- (max  (*) -Jt)^" 

67.  x"-(max  (*)-x)''   

68.  x' (max  (x)  — x)' *  '  x  is  rainfall  amount. 

X"  (a=1.0  (-0.1)  0.1),  including  the  limiting 
form  In  x,  produces  distributions  for  which  /3i.  ^2 
remain  very  close  to  the  Type  III  (gamma)  parabola 
2/32  — 3)8i —  6  =  0.  (Note  that  one  axis  is  labelled 
not  /3i.)  In  the  one  case,  there  is  an  a  which 
induces  a  gamma  distribution,  whereas  in  the  other 
case  even  the  logarithmic  transformation  does  not 
succeed.  Some  further  examples  of  root  transforma- 
tions on  daily  and  hourly  positive  rainfall  for  sta- 
tions in  the  Southeast  are  given  in  tables  5  and  6. 

In  both  cases,  that  is  for  daily  and  hourly  rainfall, 
a  lOth-root  or  logarithmic  transformation  drastically 
reduces  the  skewness  and  kurtosis  to  within  a  small 
error  of  the  normal  values      —     ft^  —  ^)- 

Whether  there  exists  a  simple  transformation  to 
nullify  the  skewness  of  all  skew  Pearson  curves 
(that  is,  the  family  with  density  y,  where  dyldx 
=  {x  + d)l (ax^  +  bx  + c))  has  not  been  completely 
resolved;  the  root  transformation  fails  with  extreme 
cases.  Similarly,  we  have  not  found  a  simple  pencil 
of  transformations  to  take  a  given  highly-skewed 


Table  5.— Root  transformations^  on  positive  daily 
data  for  several  stations  in  Georgia 


Station 

a 

Skewness 

Kurtosis 

Sample 

V/3, 

size 

Days 

Augusta  

1.0 

3.1 

19.8 

9742 

.2 

.3 

2.4 

9742 

.3 

-.2 

2.2 

9742 

I.O 

4.6 

44.8 

5715 

.3 

-.4 

2.7 

5717 

Bellville  

1.0 

1.7 

6.2 

646 

.2 

-.2 

2.4 

646 

LO 

4.3 

35.5 

6781 

.3 

-.2 

2.7 

6781 

Dublin  

1.0 

5.1 

59.9 

3941 

.3 

-.2 

3.4 

3941 

1.0 

3.4 

22.8 

9424 

.3 

0 

1.9 

9424 

Savannah  

1.0 

5.4 

50.3 

5930 

.4 

-.3 

2.3 

5930 

'  Transformations  are  x". 


InX 
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Figure  2.  — Positive  hourly  rainfall,  Charleston.  S.C. 

distribution  over  to  the  gamma  form.  If  such  existed 
and  \  =x''  was  approximately  a  gamma  variate, 
then  the  generalized  gamma  distribution  mentioned 
under  "The  Generalized  Cjamma  Distribution" 
would  fit.  at  least  approximately,  the  original  data. 
A  number  of  attempts  to  fit  this  distribution  to  posi- 
tive hourly  rainfall  data  have  yielded  unsatisfactor> 
results.  Whether  this  is  surprising  or  not  depends 
to  a  certain  extent  on  one's  appraisal  of  skewness. 
By  contrast,  in  the  statistical  literature  there  are 
several  simple  transformations  for  inducing  nor- 
mality which  we  mention: 

\/(2x')  for  a  x"-variate  (13). 

(2)  '^ix-lv)  for  a  x'r-variate  (.5/). 

(3)  V(.r  +  a)  (x  ^  —  a)  for  a  variate  with  den- 
sity A.v'       exp  (- A.v)  .V  >  0  (8). 

(4)  zoT  z  -  {3z  -  r)IAn  where  z^k  In  (1-rr)/ 
(1  —  r)  for  a  correlation  coefticient  (/..  1^). 

(5)  \  v  (»/■  In  (.v  +  m)  tor  .i  Poisson  \     uUc  \Mth 
a  mean  m  (4t>). 
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Figure  3.  — Positive  hourly  rainfall,  Valdosta,  Ga. 


(6)  x^l^  for  a  Poisson  variate  (Anscombe  in  his 
contribution  to  the  discussion,  in  Hotel- 
Ung  [19). 

(7)  transformations  for  t  and  F  (26). 


fix)  =  yoix  +  a)''  exp  (—  Ax)       x  ^  —  a 

(12) 

=  0  X  <  —  a. 

In  fact,  if  the  skewness  for  equation  12  is  j8i,  then 
the  transformation  induces  a  distribution  with 
the  parameters 

Mean: 

f       1        2         41  1 

Ix'Aa)  =  f«   1  h  +  h  .  .  .  (13a) 

'  [      9t     243f2     6561^3  J' 

Standard  deviation: 
Skewness: 

i8i(r,)=0  (13c) 

Kurtosis: 

2  32 

where  f  =  4/j8i,  and  the  root  transformation  is 
given  approximately  by 


Notice  that  in  each  of  these  cases  normality  is 
achieved  for  an  extreme  value  of  a  parameter;  thus, 
(1)  and  (2)  depend  on  the  number  of  degrees  of 
freedom  being  large,  (5)  and  (6),  on  a  large  mean. 
Reference  may  also  be  made  to  Johnson's  (20)  log- 
arithmic and  exponential  transformation  applied  to 
Pearson  curves,  which  in  general  reduce  the  skew- 
ness and  kurtosis.  Again,  in  passing,  we  mention 
an  observation  made  by  Curtiss  (8).  that  a  single- 
value  function  f{x)  will  never  transform  a  discrete 
variate  into  a  continuous  variate  — normal  or 
otherwise. 

Some  comments  on  transforming  gamma  dis- 
tributions into  near  normal  form  should  be  noted. 
It  has  been  shown  (37,  38)  that  a  transformation 
Ta  =  {x  +  a)"  can  always  be  found  which  "reduces" 
the  skewness  to  zero  in  the  case  of  the  gamma 
density 


14         4  148 

fy  —  1  U  (}A) 

3     81t     243i2     19683f3      "  "  '  "  ^ 

It  is  noteworthy  that  for  large  t  (small  j8i  or  large 
value  for  the  degrees  of  freedom)  the  value  of  a 
approximates  the  cube  root  law.  A  table  including 
a,  a{Ta)  and  ji-ziT^)  for  V^=0.10  (0.01)  10.0 
(0.1)  22  is  given  in  the  reference  quoted;  applica- 
tions are  also  included.  In  summary,  if  we  have 
data  following  a  gamma  distribution  (for  simplicity, 
assume  the  origin  is  x  =  0,  or  a  =  0)  then  the  variate 

n^^^l^J^  (15) 

has  zero  mean,  unit  standard  deviation,  zero  skew- 
ness, and  is  approximately  a  Pearson  Type  II  dis- 
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Table  6.  — Root  transformations  •  on  positive  hourly  data  for  several  stations 


Station 

a  — 

A  7 
U.  1 

u.o 

U.o 

U.Z 

....  - 

A  1 

In 

VflT 

Hi 

5.2 

3.1 

2.5 

2.0 

1.0 

0.8 

0.5 

A' =5955 

44.8 

17.6 

12.7 

9.1 

3.8 

3.0 

2.5 

Charleston,  S.C  

5.4 

3.2 

2.6 

2.2 

1.0 

.8 

,5 

A'=5392 

Hi 

47.4 

18.3 

13.2 

9.5 

3.8 

3.0 

2.5 

Valdosta,  Ga  

"HI 

6.9 

3.7 

3.0 

2.4 

1.1 

.8 

.5 

A  =  4360 

85.6 

26.2 

17.5 

11.8 

4.1 

3.2 

2.6 

Savannah,  Ga  

VflT 

4.2 

2.8 

2.3 

2.0 

1.0 

.7 

.5 

/V'  =  4940 

Pi 

27.6 

13.2 

10.1 

7.7 

3.6 

2.9 

2.4 

Elkins,  W.  Va  

5.3 

3.2 

2.6 

2.2 

1.2 

.9 

.7 

A  =  8326 

48.5 

18.8 

13.6 

9.9 

4.3 

3.4 

2.8 

Nashville.  Tenn  

4.8 

2.9 

2.4 

1.9 

.9 

.6 

.4 

A  =  5671 

38.7 

15.7 

11.4 

8.3 

3.5 

2.8 

2.4 

6.2 

3.2 

2.5 

2.0 

.9 

.6 

.4 

A  =  6108 

75.9 

21.8 

14.5 

9.9 

3.8 

3.0 

2.5 

6.1 

3.5 

2.9 

2.3 

1.1 

.8 

.5 

A  =  4994 

64.0 

22.8 

15.9 

11.2 

4.2 

3.3 

2.6 

5.1 

2.9 

2.4 

1.9 

1.0 

.7 

.5 

A  =  6047 

Pi 

49.2 

17.0 

12.1 

8.7 

3.7 

3.0 

2.5 

4.8 

2.7 

2.3 

1.8 

.9 

.6 

.4 

A  =  6390 

43.4 

15.3 

11.0 

8.0 

3.5 

2.8 

2.4 

'  Transformations  are  x"  and  In  x. 


tribution  (the  percentage  points  of  which  have  been 
tabulated  for  most  practical  purposes  in  Johnson 
and  others  (22)).  In  particular,  the  d\h  percentage 
point  of  the  original  distribution  (equation  12)  is 
given  by 

;«;e  =  i\^{(A"M;(«)  +  A:"o-„r«)»/"-4//3,  (16) 
where 

k  =  2lia-V^,), 
p+l  =  4/^.. 

Th  is  the  ^h  percentage  point  of  the  transformed 
variate  and  /uj,  cr<,  are  given  in  equation  13. 

We  mention  briefly  another  distribution  belong- 
ing to  the  Pearson  system  which  can  be  approxi- 
mately normalized.  Assume  that  a  transformation 
of  the  form  x"  produces  a  distribution  with  = 
and  kurtosis  lying  between  1.8  and  3.  (See  previ- 
ously discussed  transformations  of  daily  [>ositivc 
rainfall.)  Then  this  first  transformation  leads  to 
the  Type  II  (standardized)  Pearson  curve,  with 
density 


y  =  yo(l-Jf2)"'.         |x|  ^  1        (m>-Ii  (17) 

=  0,  >  1 

where  y«=  r(m  +  3/2)/r(l/2)  r(m-f  1).  In  terms 
of  its  kurtosis, 

5/82  -  9  6m  +  9 

so  that  m  =  0  corresponds  to  /3-  =  1 .8  and  m  =  3c  to 
fiz  =  3.  Then  it  is  always  possible  to  find  a  trans- 
formation: 

TM  =  (1  -f  .v)"-  (1  -  .v)".       |.v|  ^  1. 

such  that  its  kurtosis  is  exactly  three,  so  that  the 
variate  /„(.v)  is  nearly  normal  (exactly  so.  as  far  as 
the  skewness  and  kurtosis  are  concerned).  It  turns 
out  that  there  are  two  real  positive  o  roots  to  the 
equation: 

Mar,,)  -;V5(r.,). 


Figure  4.  —  Root  transformations  for  type  III  curves. 
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with  asymptotic  expansions 


3-V7  33V7-84 
2  112m 


3  +  V?     33  V7+84 


at 


+ 


112m 


+  0(m-2),  (19a) 


+  0(m-2)  (19b) 


or 


as  =  0.177,124  -  0.029,566/m 
a,  =  2.822,875  +  1.529,552/m 


approximately.  Judging  by  the  discrepancy  of  the 
sixth  standardized  moment  )U./(7®  from  the  moment 
value  15  for  the  normal,  it  appears  that  a.,  is  supe- 
rior. In  particular,  it  seems  worth  mentioning  that 
if  X  is  a  uniform  random  variate  on  (—1,  1),  then. 


TM  = 


(1  +  x)"  -  (1  -x)" 
0.216807 


(a  =  0.134912) 


(20) 


is  approximately  normal  with  central  moment 
parameters 

o-(r)  =  i,     /3,(r)  =  o, 

I32{T)  =  3,       mhID  =  14.710. 

Again,  the  discrepancy  of  the  {2r)th  central  moment 
of  r<,,  from  its  normal  value  is  given  by 

^l2r(Ta^{x))        ^  r(r-l)(r-2)(3-V7)'^ 

1,3  (2r-l)At.^  , 

5!  m- 


+  0(m-3)        (r=  1,  2,  .  .  .). 


(21) 


More  general  transformations  have  been  applied 
to  daily  and  hourly  rainfall  amounts,  and  a  list  was 
given  on  page  180.  These  transformations  consist  of 
root  transformations,  logarithmic  functions  of 
linear  forms  of  the  variable,  and  transformations 
of  the  form  x"-  (x^^^-x)"  for  a  =  0.1(0.1)2.0.  If 
possible,  on  the  grounds  of  simplicity,  we  would 
prefer  to  use  a  root  transformation  rather  than 
others.  However,  the  transformation  .r"  for  varying 
a  applied  to  a  gamma  variate  results  in  a  (\  fix. 


P2)  plot  which  is,  to  all  intents  and  purposes, 
parallel  to  this  plot  for  the  gamma  distribution  (see 
fig.  4).  Similarly,  the  same  pencil  of  transformations 
applied  to  a  Pearson  curve,  for  which  all  the 
moments  exist,  produces  a  trace  which  is  approxi- 
mately parallel  to  the  Type  III  parabola.  Hence,  we 
can  not  expect  to  transform  a  distribution  of  the 
type  discussed  to  a  gamma  distribution  using  a 
root  transformation.  However,  as  mentioned  else- 
where, a  root  transformation  (including  the  extreme 
form  In  x)  can  always  be  found  to  reduce  fii  of  a 
gamma  variate  to  zero. 

In  table  7,  we  have  recorded  the  best  normalizing 
transformations  from  the  Ust  given  previously; 
"best"  is  interpreted  as  the  simultaneous  reduction 
of  /3i  to  near  zero  and  fi-z  to  near  three.  A  glance  at 
the  results  encourages  belief  that  in  spite  of  the 
initial  extreme  skewness  and  kurtosis,  relatively 
simple  transformations  do  exist  with  the  desired 
property. 

However,  those  unfamiliar  with  the  field  of  mete- 
orological statistics  are  not  aware  of  the  great 
variety  of  situations  which  comprise  large-scale 
weather  phenomena.  When  we  attempt  to  normaUze 
or  gamma-ize  rainfall  data,  we  are  confronted  by  a 
space  of  distributions  which  goes  far  beyond 
commonly  accepted  perturbed  situations.  The 
very  fact  that  the  discriminatory  diagram  for 
Pearson  curves  is  usually  quoted  for  \  /3i  at  most 
as  large  as  about  2.  and  that  the  percentage  points 
tables  of  Johnson  and  others  (22)  only  include 
<  2,  ft>  <  14,  indicates  the  extreme  nature  of 
the  problem  we  are  trying  to  deal  with  in  the  case 
of  rainfall.  Our  investigations  so  far  do  hold  out 
possibilities  for  finding  simple  transformations 
which  would  either  remove  the  skewness  and 
kurtosis.  or  approach,  to  an  approximation,  the 
normal  form  itself  (table  8). 

The  possibility  of  determining  Y  =  x''  so  that  y 
is  a  gamma  variate.  directed  our  attention  to  a 
consideration  of  the  generalized  gamma  distribu- 
tion as  a  possible  distribution  to  fit  rainfall  amounts. 
A  few  cases  have  been  studied,  which  we  now 
briefly  discuss. 

The  equation  of  the  generalized  gamma  distribu- 
tion is  given  in  equation  8  along  with  the  equations 
which  lead  to  maximum  likelihood  estimates  of  the 
three  parameters  ti,  h.  k.  Note  that  the  distribution 
takes  on  a  half-normal  form  \sIumi  />  2.  and  reduces 
to  the  usual  gamma  distribution  when  h=\.  The 
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Table  7.— Best  trans  formations  f  or  positive  daily  and  hourly  rainfall  for  southeastern  stations  {1955-64)  ' 


Mean 

S.D. 

Skewness 

Kurtosis 

Type  III 

Station 

Transformations 

X 

& 

criterion 

20.  -3|8i  -6 

Best  transformations  for  positive  hourly  rainfall 

^0.3 

0.6241 

0.2630 

0.48 

2.98 

-1.7 

rind  +  z)l" 

.3947 

.2698 

.75 

2.74 

-  1.2 

Ch3.rlcston  S  C 

.6882 

.2851 

.53 

2.81 

-  1.2 

[ln(l  +;c)l''« 

.4601 

.3002 

.71 

2.83 

-  1.9 

Valdosts  Ga. 

.6898 

.2814 

.65 

3.04 

-  1.2 

x"~ 

.7665 

.2100 

.36 

2.60 

-  1.2 

Savannsh,  Ga.  

.6772 

.2878 

.48 

2.62 

-  1.4 

x"-* 

.6184 

.3442 

.75 

3.13 

-  1.5 

Elkins  W  Va   

x"" 

.4988 

.2453 

.67 

2.92 

-  1.5 

.5802 

.2167 

.44 

2.53 

-  1.5 

Nashville  Tenn   

z"  ■■' 

.6455 

.2661 

.44 

2.67 

-  1.3 

rind  +  x)]°-^ 

.4162 

.2757 

.67 

2.83 

-  1.7 

.5441 

.2659 

.66 

3.10 

-  1.1 

x""' 

.6193 

.2306 

.39 

2.64 

-1.2 

„,         „  p 

.6458 

.2649 

.56 

2.96 

-  1.0 

rind  +  id" 

.4152 

.2759 

.77 

3.06 

-  1.6 

R  \/ 

x"* 

.5222 

.2781 

.76 

3.11 

-  1.5 

.5982 

.2421 

.50 

2.60 

-  1.5 

lvnrkv\/ill^      1  f^nn 

^0.4 

.5643 

.2975 

.70 

3.14 

-  1.2 

x"'' 

.6341 

.2552 

.43 

2.61 

-  1.3 

^0.3 

.6499 

.2740 

.50 

2.66 

-  1.4 

rind  +  x)i" 

.4211 

.2850 

.73 

2.84 

-  1.9 

^(1.4 

.5850 

.3237 

.78 

3.20 

-  1.4 

Athens,  Ga."^  

x"' 

.6383 

.3503 

.69 

3.09 

-  1.2 

x"-' 

.6938 

.2918 

.41 

2.55 

-  1.4 

^0.3 

.6607 

.2843 

.52 

2.73 

-  1.3 

find  +  xd"'' 

.4332 

.2962 

.72 

2.82 

-  1.9 

x"  "* 

.5989 

.3382 

.81 

3.38 

-  1.2 

x"-' 

.6634 

.2965 

.59 

2.87 

-  1.3 

Lin  V  -1  1  X ;  J 

.4370 

.3083 

.77 

2.90 

-2.0 

x"' ' 

.7217 

.0840 

.76 

3.00 

-  1.7 

rind  +  x)i"  - 

.5234 

.1199 

.86 

3.20 

-  1.8 

.7190 

.0803 

.65 

2.66 

-2.0 

x"' 

.7394 

.0953 

.76 

3.02 

-  1.7 

rind  +  x)!"  - 

.5489 

.1371 

.84 

3.10 

-  1.9 

rind  +  x)l"' 

.7354 

.0896 

.61 

2.59 

-  1.9 

Valdosls  Ga   

x"' 

.7463 

.0971 

.77 

3.16 

-  1.4 

ln(x) 

—  3.0067 

1.2589 

.50 

2.56 

-  1.6 

[ln(l  +  x)]"-^ 

.5587 

.1398 

.83 

3.16 

-  1.7 

x"  ' 

.7405 

.0970 

.73 

2.85 

-  1.9 

rind  +x)]''-^ 

.5506 

.1396 

.81 

2.95 

-2.1 

(l  +  x)"-'-x"'' 

.5369 

.1346 

-.72 

2.68 

-2.2 

Elkins,  W.  Va  

x"' 

.7015 

.0719 

.92 

3.39 

-  1.8 

rind +x)]''> 

.6996 

.0695 

.84 

3.08 

-2.0 

ln(x) 

-  3.5959 

.9883 

.72 

2.78 

-2.0 

x"  ' 

.7318 

.0857 

.64 

2.82 

-  1.6 

rind  +x)]"-^ 

.5377 

.1228 

.74 

2.97 

-  1.7 

rin(l  +  x)]»' 

.7287 

.0817 

.53 

2.51 

-  1.8 
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Table  7.  — Best  transformations  for  positive  daily  and  hourly  rainfall  for  southeastern 

stations  {1955-64)  '  —  Continued 


Mean 

S.D. 

Skewness 

Kurtosis 

Type  III 

Station 

Transformations 

X 

& 

criterion 

2/82  -3)3,-6 

Best  transformations  for  positive  hourly 

rainfall 

Bristol,  Tenn  

.7245 

.0775 

.64 

3.00 

-  1.2 

[ln(l+x)]"' 

.5268 

.1105 

.76 

3.20 

-  1.3 

.5595 

.1080 

-.70 

2.97 

-1.5 

Florence,  S.C  

.7337 

.0873 

.77 

3.26 

-1.2 

[ln(l  +  x)]"' 

.7304 

.0826 

.62 

2.80 

-  1.6 

(l  +  xy'*-x'i* 

.5466 

.1212 

-.77 

3.06 

-  1.7 

ln(x) 

-3.1639 

1.1538 

.51 

2.64 

-  1.5 

Roanoke, Va  

x"' 

.7173 

.0756 

.71 

3.01 

-  1.5 

[Ind  +  x)]-' 

.7150 

.0727 

.62 

2.73 

-1.7 

(1  +  x)'  '^x'^^ 

.56% 

.1052 

-  .78 

3.06 

-  1.7 

Knoxville,  Tenn  

x»' 

.7287 

.0804 

.62 

2.84 

-  1.5 

[Ind+x)]"^ 

.5328 

.1150 

.72 

3.02 

-  1.5 

(l  +  x)"-^x"-' 

.5536 

.1123 

-.67 

2.81 

-  1.7 

Macon,  Ga  

x'" 

.7318 

.0889 

.72 

2.97 

-  1.6 

[ln(l  +  x)]"- 

.5378 

.1274 

.81 

3.10 

-  1.8 

[ln(l  +  x)]"  ' 

.7285 

.0843 

.60 

2.59 

-  1.9 

(l-l-x)"^-x"^ 

.5490 

.1236 

-  .73 

2.82 

-2.0 

Athens,  Ga.-  

x"  ' 

.7405 

.0896 

.58 

2.67 

-  1.7 

x"- 

.5564 

.1376 

.82 

3.29 

-  1.4 

[ln(l  +x)]"- 

.5502 

.1288 

.66 

2.76 

-  1.8 

.jacksonville,  ¥\a  

x"  ' 

.7419 

.0979 

.69 

2.77 

-  1.9 

[Ind  +  x)]"^ 

.5526 

.1409 

.77 

2.85 

-2.0 

^0.2 

.5600 

.1523 

.95 

3.45 

-  1.8 

Montfiomery,  Ala  

x"  ■ 

.7428 

.0956 

.67 

2.80 

-  1.7 

[Ind  +  x)]"-^ 

.5538 

.1376 

.74 

2.87 

-  1.9 

x"^ 

.5609 

.1484 

.92 

3.51 

-  1.6 

'  The  best  transformations  are  those  whose  moments  are  nearest  normality. 
-  1955-61. 


distribution  provides  a  classical  example  of  the 
indeterminate  case  in  the  Stieltje's  moment  prob- 
lem—the moments  only  determine  the  curve 
uniquely  if  6^=1/2.  If  0  <  6  =s  1/2.  there  are 
"shadow"  functions  which  moment  methods  cannot 
detect.  Estimation  aspects  of  the  distribution  have 
not  been  completely  resolved  (Cohen  (6),  Johnson 
and  Kotz  (27).  and  Hager  and  Bain  (78)).  One  would 
expect  the  transformation  parameter  b  to  be  the 
trickiest  to  pinpoint.  Slight  perturbations  here 
would  clearly  lead  to  marked  distributional  modifica- 
tions. It  would  also  be  surprising  if  anything  but 
very  large  samples  could  be  tolerated  for  error 
covariance  analysis.  Examples  are  given  in  tables 


9  and  10.  .An  unsatisfactory  (as  judged  by  the 
X-squared  test)  fit  by  maximum  likelihood  is  given 
for  Savannah  positive  hourly  observations.  Positive 
hourly  observations  for  Savannah  were  fitted  ac- 
ceptably well  by  searching  the  parameter  space  for 
a  solution  to  the  Kolmogorov-Smirnov  criterion. 


in 


in  max  |F(.rj)  -^.rj/n|  <  AlVn 


where  F  refers  to  the  theoretical  cumulative  dis- 
tribution, n  is  the  sample  size,  and  A  is  chosen  to 
correspond  to  the  80-percent  acceptance  level. 
Although    this   test    procedure   is   open   to  some 
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criticism,  it  is  the  only  one  chosen  from  maximum 
likelihood,  moments,  moments  of  the  logarithms  of 
the  data,  and  others  which  leads  to  an  admissible 
solution.  The  fit  is  acceptable  at  the  80-percent 
level,  and  although  the  value  of  a'  is  of  the  order 
10~'^,  this  reflects  the  choice  of  parametrization. 
Actually  {a')^  is  a  moderate  value.  The  theoretical 
frequencies  were  computed  by  a  continued  fraction 
development  for  the  Type  III  probability  integral 
(the  digital  program  being  given  by  Bargmann  (3)), 
and  checked  by  a  16-point  Gaussian  quadrature 
formula  in  double  precision  (effectively  28  d.p.'s) 
on  a  CDC  64(X)  computer;  agreement  to  more  than 
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four  significant  digits  was  found  for  all  the  sub- 
interval  computations. 

Relations  Between  Rainfall  Amounts 
for  Time  Intervals  of  DiflFerent 
Lengths 

We  have  considered  rainfall  amounts  at  fixed 
locations  for  time  intervals  of  an  hour,  a  day,  a 
month,  and  so  on.  The  question  of  possible  inter- 
relations between  the  associated  hypothesized 
distributions  or  random  variables  is  a  natural  one. 
The  subject  has  been  studied  by  Kotz  and  Neumann 


Table  8.  — Cube  root  of  positive  daily  precipitation  and  ninth  root  positive  hourly  precipitation  for  weather 

stations  in  southeastern  United  States  (1955-64) 


Station 


ansformation 

Mean 

X 

cr 

Skewness 

V  pi 

Kurtosis 

Type  III 
criterion 

2/32-3/3,-6 

I  ^1/3 

0.5963 

0  9R93 

0.55 

2.77 

—  1.4 

.6966 

.0904 

.78 

3.07 

—  1.7 

.6640 

.3101 

.70 

3.11 

-  1.3 

.7158 

.1029 

.79 

3.10 

-  1.7 

^1/3 

.6699 

.3040 

.71 

3.16 

-  1.2 

^1/9 

.7232 

.1049 

.80 

3.25 

-  1.4 

.6520 

.3067 

.63 

3.00 

-  1.2 

.7169 

.1047 

.76 

2.92 

-1.9 

^1/3 

.5609 

.2256 

.47 

2.55 

-1.5 

.6748 

.0772 

.95 

3.47 

-1.8 

^1/3 

.6276 

.2831 

.53 

2.96 

-.9 

.7075 

.0923 

.67 

2.89 

-1.6 

.5933 

.2528 

.49 

2.68 

-  1.3 

^1/9 

.6995 

.0834 

.67 

3.07 

-  1.2 

.6247 

.2859 

.72 

3.29 

-1.0 

^1/9 

.7096 

.0942 

.80 

3.35 

-  1.2 

.5719 

.2527 

.62 

2.87 

-  1.4 

^1/9 

.6917 

.0813 

.74 

3.08 

-  1.5 

^1/3 

.6040 

.2737 

.56 

2.73 

-  1.5 

^1/9 

.7040 

.0866 

.64 

2.90 

-  1.4 

x"--> 

.6333 

.2945 

.52 

2.66 

-  1.5 

.7073 

.0957 

.75 

3.05 

-  1.6 

.6780 

.3135 

.51 

2.63 

-1.5 

.7168 

.0966 

.60 

2.72 

-1.6 

xm 

.6302 

.3074 

.63 

2.88 

-1.4 

.7184 

.1057 

.72 

2.84 

-1.9 

x'« 

.6432 

.3142 

.71 

3.18 

-1.1 

x"5 

.7193 

.1031 

.70 

2.87 

-1.7 

Greensboro,  N.C. 
Charleston,  S.C.. 

Valdosta,  Ga  

Savannah,  Ga  

Elkins,  W.  Va.... 
Nashville,  Tenn.. 

Bristol,  Tenn  

Florence,  S.C  

Roanoke.  Va  

Knoxville,  Tenn.. 

Macon,  Ga  

Athens,  Ga.^  

Jacksonville,  Fla. 
Montgomery,  Ala 


'  Actual  root  for  positive  daily  rainfall  was  x''^^^^^^^. 

2  Actual  root  for  positive  hourly  rainfall  v*ras  x"  ""'"". 

3  1955-61. 
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Table  9.— Fit  of  generalized  gamma  to  Savannah,  Ga.;  positive  hourly  precipitation  by  maximum 

likelihood  estimation  (1955-64)^ 


X 

01)S€rv6(i 

r  \ n#»ptpH 

values 

P(Obs) 

Pi  Exp) 

u 

0  00-0  01 

1232 

992.7 

57.68 

57.68 

0.2494 

0.2010 

0.0484 

0  01-  02  

660 

718.5 

4.77 

62.45 

.3831 

.3465 

.0366 

0  02  -  03 

425 

501.2 

11.57 

74.02 

.4691 

.4479 

.0212 

0  03-  04 

309 

372.5 

10.84 

84.86 

.5317 

.5234 

.0083 

0.04-.05  

255 

289.2 

4.05 

88.91 

.5833 

.5819 

.0014 

0  05  -  06 

211 

231.7 

1.84 

90.75 

.6260 

.6288 

.0028 

0  06-  07 

147 

189.9 

9.70 

100.45 

.6558 

.6673 

.01 15 

0  07  -  08 

147 

158.6 

.85 

101.30 

.6856 

.6994 

.0138 

0  08-  09 

122 

134.4 

1.15 

102.45 

.7103 

.7266 

.0164 

0.09-.10  

113 

115.3 

.05 

102.50 

.7331 

.7500 

.0168 

0 10-  11 

89 

100.0 

1.20 

103.70 

.7512 

.7702 

.0191 

0 11-  12 

92 

87.4 

.24 

103.94 

.7698 

.7879 

.0181 

0 12-  13 

69 

77.1 

.84 

104.78 

.7838 

.8035 

.0198 

0  13-  14 

69 

68.4 

.01 

104.79 

.7977 

.8174 

.0196 

0.14-.15  

65 

61.0 

.26 

105.05 

.8109 

.8297 

.0188 

0 15-  16 

59 

54.7 

.33 

105.38 

.8228 

.8408 

.0180 

0  16-  17 

48 

49.3 

.04 

105.42 

.8326 

.8508 

.0182 

0 17-  18 

42 

44.7 

.16 

105.58 

.8411 

.8598 

.0188 

0 18-  19 

46 

40.6 

.71 

106.29 

.8504 

.8681 

.0177 

0.19-.20  

47 

37.0 

2.67 

108.97 

.8599 

.8756 

.0157 

0  20-  21 

40 

33.9 

1.10 

110.06 

.8680 

.8824 

.0144 

0  21-  22 

34 

31.1 

.27 

110.33 

.8749 

.8887 

.0139 

0  22-  23 

33 

28.6 

.66 

110.99 

.8816 

.8945 

.0130 

0  23  -  24 

16 

26.4 

4.12 

115.11 

.8848 

.8999 

.0151 

0.24-.25  

27 

24.5 

.26 

115.38 

.8903 

.9048 

.0146 

0  25-  26 

20 

22.7 

.32 

115.69 

.8943 

.9094 

.0151 

0  26-  27 

25 

21.1 

.73 

116.42 

.8994 

.9137 

.0143 

0  27-  28 

26 

19.6 

2.07 

118.50 

.9046 

.9177 

.0130 

0  28-  29 

22 

18.3 

.75 

119.24 

.9091 

.9214 

.0123 

0.29-.30  

16 

17.1 

.07 

119.31 

.9123 

.9248 

.0125 

0.30-.31  

19 

16.0 

.56 

119.87 

.9162 

.9281 

.0119 

rt  "Jl  _  oo 

17 
1  1 

1  0 

9ft 

120.14 

.91% 

.9311 

.01 15 

9 

14.1 

1.84 

121.98 

.9214 

.9340 

.0125 

0  33-  34 

14 

13.2 

.04 

122.02 

.9243 

.9367 

.0124 

0  34-  35 

14 

12.5 

.19 

122.21, 

.9271 

.9392 

.0121 

0.35 -.36  

14 

11.8 

.43 

122.64 

.9299 

.9416 

.0116 

0.36-.37  

26 

11.1 

20.03 

142.67 

.9352 

.9438 

AX186 

0.37-.38  

13 

10.5 

.61 

143.27 

.9378 

.9459 

(XV<1 

0.38-.39  

6 

9.9 

1.55 

144.82 

.9391 

.9479 

AXW 

0.39-.40  

9 

9.4 

.02 

144.83 

.9409 

.9498 

.0090 
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Table  9.— Fit  of  generalized  gamma  to  Savannah,  Ga.;  positive  hourly  precipitation  by  maximum 

likelihood  estimation  (7955-64)' —  Continued 


X 

Observed 

Expected 
values 

r  (Uds) 

U 

0  40-  41 

3 

8.9 

.09 

144.92 

.9425 

.9516 

.0091 

0.41 -.42 

9 

8.4 

.04 

144.96 

.9443 

.9533 

.0090 

0.42 -.43 

7 

8.0 

.13 

145.09 

.9457 

.9550 

.0092 

0  43-  44 

10 

7.6 

.74 

145.83 

.9478 

.9565 

.0087 

0  44-  45 

9 

7.3 

.42 

146.25 

.9496 

.9580 

.0084 

0  45-  46 

10 

6.9 

1.39 

147.64 

.9516 

.9594 

.0078 

0.46-.47 

6 

6.6 

.05 

147.69 

.9528 

.9607 

.0079 

0.47 -.48 

9 

6.3 

1.18 

148.88 

.9546 

.9620 

.0073 

0  48  -  49 

8 

6.0 

.68 

149.55 

.9563 

.9632 

.0069 

0  49-  50 

11 

5.7 

4.88 

154.43 

.9585 

.9643 

.0059 

0  50-  51 

10 

5.5 

3.76 

158.20 

.9605 

.9655 

.0049 

0.51 -.52 

3 

5.2 

.95 

159.15 

.9611 

.9665 

.0054 

0.52-.53 

8 

5.0 

1.80 

160.95 

.9627 

.9675 

.0048 

0  53-  54 

3 

4.8 

.67 

161.61 

.9634 

.9685 

.0051 

0  "vl-  '^'^ 

8 

4.6 

2.54 

164.15 

.9650 

.9694 

.0045 

n  '^d 

3 

4.4 

.44 

164.60 

.9656 

.9703 

0047 

0.56-.57 

5 

4.2 

.75 

165.35 

.9668 

.9712 

.0044 

0.57-.58 

4 

4.0 

.00 

165.35 

.9676 

.9720 

.0044 

0  58-  59 

4 

3.9 

.00 

165.35 

.9684 

.9728 

0044 

0  59-  60 

5 

3.7 

.43 

165.78 

.9694 

.9735 

.0041 

0  60-  61 

3 

3.6 

.10 

165.88 

.9700 

.9743 

.0042 

0.61 -.62 

4 

3.5 

.09 

165.96 

.9708 

.9750 

.0041 

0.62 -.63 

3 

3.3 

6.58 

172.54 

.9725 

.9756 

.0032 

0.63 -.64 

5 

3.2 

1.01 

173.56 

.9735 

.9763 

.0028 

0.64-.65 

4 

3.1 

.27 

173.83 

.9743 

.9769 

.0026 

0.65 -.66 

2 

3.0 

.32 

174.15 

.9747 

.9775 

.0028 

0.66 -.67 

1 

2.9 

1.21 

175.36 

.9749 

.9781 

.0032 

0.67 -.68 

2 

2.8 

.21 

175.57 

.9753 

.9786 

.0033 

0.68-.69 

2 

2.7 

.16 

175.73 

.9757 

.9792 

.0035 

0.69-.70 

2 

2.6 

.13 

175.86 

.9761 

.9797 

.0036 

0.70-00 

118 

88.7 

9.69 

185.55 

1.0000 

.9977 

.0023 

1 

A^  =  4939         ;?=  0.10430         (5-2  =  0.03661 

^  =  43.8  6  =  0.10  a  =  0.1515x  10-" 

^o%..5(68)  =  113.0         ^L,p,e(68)  =  185.6 
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Table  \0.—Fit  of  generalized  gamma  distribution  to  Savannah,  Ga.;  positive  hourly  precipitation  by- 
empirical  estimation  (1955  -64)^ 


X 

Observed 

Expected 
values 

PfExp) 

D 

0.00-0.01 

1232 

1160.1 

4.45 

4.45 

0.2494 

0.2349 

0.0146 

0.01 -.02 

660 

664.0 

.02 

4.48 

.3831 

.3693 

.0137 

n  cv)—  m 

425 

445.1 

.91 

5.39 

.4691 

.4595 

.0097 

309 

327.4 

1.04 

6.42 

.5317 

.5258 

.0059 

0  04-  05 

255 

254.1 

.00 

6.43 

.5833 

.5772 

.0061 

0.05-.06 

211 

204.5 

.21 

6.63 

.6260 

.6186 

.0074 

0.06-.07 

147 

168.8 

2.82 

9.46 

.6558 

.6528 

.0030 

0  07-  08 

147 

142.2 

.16 

9.62 

.6856 

.6816 

.0040 

0.08-  09  

122 

121.6 

.00 

9.62 

.7103 

.7062 

.0041 

0.09-10 

113 

105.3 

.56 

10.18 

.7331 

.7275 

.0056 

O.lO-.ll 

89 

92.2 

.11 

10.29 

.7512 

.7462 

.0050 

0.11-.12 

92 

81.4 

1.37 

11.66 

.7698 

.7627 

.0071 

0.12-.13 

69 

72.5 

.17 

11.83 

.7838 

.7774 

.0064 

0  13-  14 

69 

64.9 

.26 

12.09 

.7977 

.7905 

.0072 

0  14-  15 

65 

58.5 

.72 

12.81 

.8109 

.8023 

.0085 

0.15-.16 

59 

53.0 

.68 

13.49 

.8228 

.8131 

.0098 

0.16-.17 

48 

48.2 

.00 

13.49 

.8326 

.8228 

.0097 

0  17-  18 

42 

44.1 

.10 

13.59 

.8411 

.8318 

.0093 

n  iR-  10 

46 

40.4 

.77 

14.36 

.8504 

.8400 

.0104 

0  19-  20 

47 

37.2 

2.58 

16.93 

.8599 

.8475 

.0124 

0.20-21 

40 

34.4 

.93 

17.86 

.8680 

.8544 

.0135 

0.21-.22 

34 

31.8 

.15 

18.01 

.8749 

.8609 

.0140 

0  22-  23 

33 

29.5 

.41 

18.42 

.8816 

.8669 

.0147 

0  23-  24 

16 

27.5 

4.80 

23.22 

.8848 

.8724 

.0124 

0  24-  25 

27 

25.6 

.07 

23.29 

.8903 

.8776 

.0126 

0.25-.26 

20 

24.0 

.66 

23.95 

.8943 

.8825 

.0118 

0.26-.27 

25 

22.4 

.29 

24.24 

.8994 

.8870 

.0124 

0.27-.28 

26 

21.1 

1.16 

25.39 

.9046 

.8913 

.0134 

0.28-29 

22 

19.8 

.24 

25.64 

.9091 

.8953 

.0138 

0.29-.30 

16 

18.6 

.38 

26.01 

.9123 

.8991 

.0133 

0.30-.31 

19 

17.6 

.11 

26.13 

.9162 

.9026 

.01.36 

0.31 -.32 

17 

16.6 

.01 

26.14 

.9196 

.9060 

.0136 

0.32-.33  

9 

15.7 

2.86 

29.00 

.9214 

.9092 

.0123 

0.33-34 

14 

14.9 

.05 

29.05 

.9243 

.9122 

.0121 

0.34-.35 

14 

14.1 

.00 

29.05 

.9271 

.9150 

.0121 

0.35 -.36 

14 

13.4 

.03 

29.08 

.9299 

.9177 

.0122 

0.36-.37 

26 

12.7 

13.87 

42.95 

.9352 

.9203 

OU*! 

0.37 -.38 

13 

12.1 

.07 

43.02 

.9378 

.9228 

.0151 

0.38-.39 

6 

11.5 

2.65 

45.66 

.9391 

.9251 

.0140 

0.39 -.40 

9 

11.0 

.36 

46.02 

.9409 

.9273 

1  .0136 
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Table  \0.—Fit  of  generalized  gamma  distribution  to  Savannah,  Ga.;  positive  hourly  precipitation  by 

empirical  estimation  (7  955 '  — Continued 


X 

Observed 

Expected 
values 

f(Obs) 

P(Exp) 

D 

o 
o 

in 

CO 

.ov 

W.Ol 

QA9C 

Q9QA 

.vzy4 

Al  Ql 
.UlOl 

0.41 -.42 

Q 

inn 

1  n 

A/i  71 

QAA3 

Q31  C 

Al  OQ 

0.42-.43 

7 

Q  A 

v.o 

AG 

A7  1Q 

QAC7 

QQaA 

Al  09 

n  AO   AA 

1  A 

Q  1 

.Uo 

/1 7  A7 

Q/i  7Q 

.VoOo 

Al  OC 

n  AA  _  /I 

9 

8.8 

.01 

47.48 

.9496 

.9370 

.0126 

A  /I  C  _  A£ 

in 

.ol 

A7  70 

OC1 A 
.VolO 

.fool 

Al  OQ 

0.46-.47 

£ 

0 

fl  O 

o.U 

/I  Q  Q  1 

QCOQ 

O/f  A/1 

Al  OC 

0.47 -.48 

Q 

7  7 
( .  1 

/1Q  CO 

OC/tA 

0/1 1  o 

Al  07 

n  AO   AO 

Q 
O 

7  /I 

AC 

,uo 

/IQ  C7 

OCA3 
.VOOO 

.y4o4 

Al  OO 

A  ACi— 

11 

7.1 

2.12 

50.69 

.9585 

.9449 

.0136 

A  CA_  CI 

1  A 

O.o 

LK) 

CO  1  A 

OAAC 

.youo 

A/1  AO 

.y4o<i 

f\1  AO 

.U14o 

0.51 -.52 

a 

0 

O.O 

1  AC 
l.Vo 

C/l  1  A 

o4.1U 

OA!  1 

.9011 

A/1  7A 

Al  OA 

0.52-.53 

o 
o 

6.3 

.44 

a  A  CA 

54.54 

.96/7 

A/4  0n 

.9409 

.0139 

A  CO  CA 

o 

o 

0.1 

1  C7 

CA  1  1 
DO.  11 

AA3y1 

.yoo4 

ACA1 

Al  OO 

.U133 

A  C/1  CC 

8 

5.9 

.77 

56.88 

.9650 

.9513 

.0137 

A  CC  _  CA 

o 
o 

C  7 

1  oc 

CO  1  A 

00.14 

.yooo 

.yoz4 

Al  OO 

.Uloz 

0.56-.57 

0 

c  c 
0.0 

AC 

CO  IQ 
Oo.lV 

.yooo 

QC3C 

.yooo 

Al  Qa 

0.57 -.58 

A 

C  Q 

o.o 

.ol 

CQ  AQ 

OA7A 

.yo4o 

Al  QA 

.Ulou 

n  CQ_  CO 

4 

0.1 

..Zo 

CQ  7Q 

Oo.  /o 

.yoo4 

QCCA 

Al  OQ 

n  CO  — 

5 

4.9 

.00 

58.73 

.9694 

.9566 

.0128 

n  AA—  AT 

Q 
O 

A  7 

.0'* 

CQ  37 
OV.O  / 

Q7nn 

QC7A 

.yo  (O 

01 9C 

.uizo 

0.61 -.62 

A  A 
I.O 

.Uo 

CQ  AC 
OV.I'O 

Q7nQ 

QCftC 

.yooo 

m  9Q 

0.62 -.63 

Q 

O 

/I  /I 

O 

Z.oO 

fvO  QQ 

079C 

.y  /  zo 

QCQA 

.yoy'* 

.uioi 

0.63-.64 

C 

D 

A  Q 

1  9 

*=i9  A9 

073  C 

.7  (  00 

.youo 

01 39 

0.64-.65 

4 

4.2 

.01 

62.43 

.9743 

.9611 

.0132 

0.65 -.66 

2 

4.0 

1.02 

63.45 

.9747 

.9619 

.0128 

0.66-.67 

1 

3.9 

2.15 

65.60 

.9749 

.9627 

.0122 

0.67 -.68 

2 

3.8 

.84 

66.44 

.9753 

.9635 

.0118 

0.68-.69 

2 

3.7 

.75 

67.19 

.9757 

.9642 

.0115 

0.69-.70 

2 

3.6 

.68 

67.87 

.9761 

.9649 

.0112 

0.70-00 

118 

142.1 

4.10 

71.97 

1.0000 

.9937 

.0063 

yV=4929         Z=0.10430  ^2=0.03661 

A' =  33.8  6' =  0.10  a' =  0.2030  X  10-16 

^„%(,(68)  =  73.6         ;f2^^p,J68)  =  71.97  K.S.  80-percent  value  =  0.0152 
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{25}  who  consideT  the  prediction  of  mean  and  vari- 
ance for  a  niuhiple  of  a  given  time  interval.  They 
also  consider  the  distribution  of  the  sum  of  n 
exponentially  correlated  gamma  variates  (see  Kotz 
and  Neumann  (25),  Kotz  and  Adams  (24)).  Now,  if 
we  consider  whether  we  can  deduce  properties  of 
rainfall  amounts  per  day  from  rainfall  amounts  per 
hour,  it  is  evident  in  certain  climatic  regimes  that 
the  number  of  wet  hours  per  day  is  certainly  not 
24,  but  is  itself  a  random  variable.  This  suggests 
the  study  of: 

Sn=X,+X2+  .  .  .  +X„  (22) 

where  the  number  n  is  a  random  quantity  and  the 
X's  are  not  independent.  It  is  now  assumed  that: 


EXj^ 

>-1.2,  . 

.,n  (23 

\drXj  = 

cov {Xj,  Xj+i)^ 

rcr2 

.,n-l. 

COV  (Xj,  Xj  +  s)  = 

0, 

s  >  1. 

As  for  /!,  we  assume 

En=Vu 

(24) 

var  n  =  V2. 


It  may  be  proved  that  the  mean  of  the  sum  is 

ES„  =  fJiV,,  (25a) 

and  its  variance  (the  serial  correlation  of  lag  1 
between  the  variables  being  the  only  significant 
correlation): 

0-2  {S„  )  =  crW,  +  IraHV,  -D+tJiW^.  (25b) 

If  the  sum  consisted  of  a  fixed  number  of  inde- 
pendent random  variates,  then  we  should  have  for 
comparison, 

ES„  =  nfjL,  (26a) 
varS„  =  ncr2,  (26b) 

so  that  in  the  present  applications  the  mean  rainfall 
per  hour  under  equation  25a  will  be  less  than  under 
equation  26a.  whereas  the  variance  under  the  more 
realistic  model  in  equation  25b  will  be  larger  than 
under  equation  26b. 

In  table  11  we  give  means  and  variances  for  hourly 
rainfall  and  the  number  of  wet  hours  per  day  for 
several  stations.  A  glance  at  the  number  distribution 
indicates  that  it  does  not  follow  the  Poisson  model  — 
in  fac  t,  it  has  a  relatively  long  tail  and  is  not  easy 
to  fit  by  the  usual  well-known  discrete  distributions 


TabT^E  11.  —  Mean  and  variance  for  amount  of  rain  per  hour  and  number  of  n  et 
hours  per  day  for  several  stations 


Station 

Amount 

Wet  hours  per  day 

A 

A 

Greensboro,  N.C  

0.07324 

0.01501 

5.0423 

18.3352 

.10701 

.03997 

4.6846 

14.8015 

Valdosta,  Ga  

.12006 

.05866 

4.1762 

12.5379 

Savannah,  Ga   

.10)66 

.03769 

4.4147 

13.9550 

Elkins,  W.  Va  

.048'J7 

.W)586 

5.2730 

17.6931 

Nashville,  Tenn  

.08324 

.01772 

4.8141 

16.1751 

Bristol,  Tenn  

.06849 

.OlUHl 

4.6949 

14.05a< 

Florence,  S.C  

.08972 

.02707 

4.4870 

14.2930 

.06133 

.00825 

5.tH).S8 

18.4528 

.074,38 

.01222 

4.')689 

16.4609 

Macon,  Ga  

.08837 

.024;15 

4.7.509 

17.4191 

Athens,  Ga  

.09613 

.02320 

5.3.526 

19.5809 

Jacksonville.  Fla  

.1119,S 

.03883 

4.1U49 

12.8112 

.10924 

.03722 

4.3780 

14.4366 
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(see  Shenton  and  Skees  (37)).  Table  12  shows  serial 
correlations  of  order  1  for  wet  hours  for  a  number 
of  stations.  Table  13  compares  the  actual  daily 
rainfall  amount  variance  and  the  predicted  value 
using  equation  25b  with  and  without  serial  correla- 
tion. The  predicted  variances  allowing  for  a  random 
number  of  wet  hours  and  a  serial  correlation  of 
order  1  are  quite  satisfactory. 


Table  12.  — Serial  correlation  of  order  1  over  wet 
hours  for  several  stations  ' 


Station 

Covariance 

P 

Greensboro,  N.C  

0.00573 

0.33765 

Charleston,  S.C  

.01361 

.30019 

.01306 

.26328 

.01131 

.26416 

Elkins,  W.  Va  

.00187 

.32684 

.00734 

.37271 

.00300 

.31433 

.00869 

.28689 

.00251 

.29332 

KnoxviUe,  Tenn  

.00346 

.31078 

Macon,  Ga  

.00697 

.25748 

Athens,  Ga  

.00891 

.34773 

.01222 

.27112 

Montgomery,  Ala  

.00987 

.23304 

'  Wet  hours  are  all  sets  of  sequences  of  2  or  more  wet  hours. 


Evidently,  predictions  of  means  and  variances 
of  rainfall  amounts  for  larger  periods  can  be  based 
on  those  for  shorter  periods.  The  problem  of  making 
predictions  about  the  corresponding  distributions 
again  hinges  on  identifying  a  generally  applicable 
distribution  for  rainfall  amounts  for  short  periods 
such  as  an  hour. 

Summary  and  Conclusions 

Distributions,  including  the  gamma  and  its  root 
transformation  (the  generahzed  gamma  distribution), 
have  been  fitted  to  rainfall  amounts  for  short  periods 
(days,  hours)  for  several  stations  in  the  south- 
eastern States.  Good  fits  are  not  common,  this  being 
doubtless  due  to  the  excessive  skewness  and 
kurtosis  exhibited  by  the  observations  when  small 
intervals  of  time  are  considered.  For  daily  amounts 
for  southeastern  locations,  since  the  relative  occur- 
rence of  small  observations  is  high,  a  fitting  pro- 
cedure for  the  gamma  distribution  due  to  Das  (9) 
gives  satisfactory  results. 

A  study  has  been  made  of  the  effect  of  some  70 
transformations  on  daily  and  hourly  rainfall  amounts. 
In  general,  a  10th  root  or  logarithm  reduces  the 
skewness  and  kurtosis  to  within  range  of  the  values 
)8i  =  0,  /32  — 3  for  the  normal  distribution;  some- 
times \nx  overcorrects.  We  are  unable  at  this  time 


Table  13.  —  Mean  and  variance  of  several  stations  for  different  periods,  showing 
comparison  between  daily  moments  and  hourly  moments  converted  to  daily 
values 


Station 

cr- 

or-  - 

Greensboro,  N.C  

0.36939 

0.36968 

0.17404 

0.26794 

0.21501 

Charleston,  S.C  

.50130 

.50171 

.35674 

.46078 

.44516 

Valdosta,  Ga  

.50139 

.50172 

.42570 

.48994 

.52381 

Savannah,  Ga  

.48412 

.48449 

.33420 

.41784 

.40220 

Elkins,  W.  Va  

.25822 

.25857 

.07333 

.09430 

.08970 

Nashville,  Tenn  

.40073 

.40105 

.19738 

.27904 

.24776 

Bristol,  Tenn  

.32155 

.32188 

.11759 

.14881 

.14314 

Florence,  S.C  

.40257 

.40286 

.23652 

.32152 

.29068 

Roanoke,  Va  

.30701 

.30729 

.11071 

.15795 

.13009 

KnoxviUe,  Tenn  

.36959 

.36993 

.15179 

.22245 

.18193 

Macon,  Ga  

.41984 

.42018 

.25171 

.31755 

.29875 

Athens,  Ga  

.51455 

.51478 

.30513 

.45698 

.37535 

Jacksonville,  Fla  

.45283 

.45313 

.31762 

.39588 

.38173 

Montgomery,  Ala  

.47825 

.47860 

.33523 

.49777 

.39383 

'  Calculated  from  hourly  moments  vi^ithout  correlation. 

-  Calculated  from  hourly  moments  with  serial  correlation  of  order  1. 
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to  give  accurate  guidelines  for  the  optimum  trans- 
formation to  use  for  large  samples  of  observations 
for  which  several  low-order  moments  are  available. 

Lastly,  we  have  briefly  discussed  the  relation- 
ship of  means  and  variances  for  short-time  intervals 
compared  to  those  for  longer  intervals,  the  basic 
idea  being  that  the  longer  interval  is  a  random 
multiple  of  the  shorter  interval. 
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A  STOCHASTIC  MODEL  FOR  DAILY  RAINFALL  DATA  SYNTHESIS 

By  N.  N.  Khanal  and  R.  L.  Hamrick  ' 


Abstract 

A  simple  mathematical  model  was  set  up  to 
synthesize  daily  rainfall  values  observed  at  a  point. 
The  synthesized  daily  rainfall  values  should  be 
suitable  for  use  in  the  Watershed  Systems  Model 
prepared  in-house  by  the  FCD  engineers. 

A  first-order  Markov  Chain  Model  was  proposed 
to  do  this.  Statistical  comparison  was  made  be- 
tween the  historical  and  the  synthesized  daily 
rainfall  values  and  they  fit  fairly  well. 


Introduction 

A  stochastic  process  is  defined  as  a  collection  of 
random  variables  X{t)—  [Xt,  teT^^  which  is  a  func- 
tion of  time  and  whose  variate  X,  is  running  along 
in  time  t  within  a  range  T.  The  set  T  is  called  the 
index  set  of  the  process.  The  stochastic  process 
can  be  regarded  as  a  discrete  or  a  continuous 
process,  depending  on  T.  If  T  takes  only  discrete 
values,  r=  [0,  1,  2,  .  .  .],  the  process  is  termed 
a  discrete  process.  If  T  takes  continuous  values, 
7"=  [f  :  —  20  <  f  < -|- oo] ,  the  process  is  termed  a 
continuous  process.  A  value  which  X{t)  takes  is 
called  a  state  of  the  process.  The  set  of  values  in 
which  all  the  values  of  X{t)  lie  is  called  the  state 
space  (2). 

Daily  rainfall  observed  at  a  point  is  a  continuously 
recorded  hydrologic  process.  Analysis  is  per- 
formed by  transforming  the  continuous  process 
into  a  discrete  process  with  time  interval  Af.  A 
real  valued  function  defined  on  a  sample  space  is 
called  a  random  variable.  Description  of  daily  rain- 
fall values  as  a  discrete  random  variable  is  satis- 
factory when  considering  the  recorded  daily  values. 


'  Systems  engineer  and  planning  engineer.  Central  and  South- 
ern Florida  Flood  (^mtrol  Dislricl,  West  Palm  Beach.  F'la. 


but  it  only  approximates  the  natural  rainfall 
process  (8). 

Let  Xq,  X\,  X2,  ...  be  the  successive  observa- 
tions of  daily  rainfall  values  at  times  t  =  0,  1, 
2,  .  .  .  T.  The  possible  values  of  Xi  are  0.00,  0.01, 
0.02,  .  .  ..  The  collection  of  Xi,  X-z-  ...  is  re- 
ferred to  as  rainfall  process.  Rainfall  amounts 
observed  during  different  short  time  intervals 
(hours,  days)  are  not  independent  events. 

According  to  Grace  and  Eagleson  (7),  "There  is 
sufficient  information  available  in  the  literature  to 
indicate  that  there  is  negligible  dependence  or  serial 
correlation  in  series  of  annual  rainfall  depths.  How- 
ever, when  the  series  of  rainfall  depths  over  shorter 
time  intervals  are  analyzed,  it  is  normally  found  that 
there  is  definite  dependence  inherent  in  the  series." 

Pattison  (8)  analyzed  the  successive  hours  of 
rainfall  data  observed  at  Boulder  Creek,  CaUf.. 
and  showed  the  dependency  of  one  hour's  rainfall 
to  successive  hours  by  estimating  the  conditional 
probabilites.  Given,  zero  rainfall  during  hour.  / 

pr[X,^i  =  O.OOIX,  =  0.00]  =  0.962 

pr[^,+  ,  =  0.01/;t,  =  0.00]  =0.017. 

When  the  rainfall  during  hour  t  is  0.01,  the  con- 
ditional probabilities  are 

pr[X,^i  =  0.00/.Y,  =  0.01  ]  =  0.397 

and 

pr[Xny  =  O.OIIX,  =  0.01  ]  =  0.261. 

Pattison  further  stated  that  the  dependence  is  not 
confined  only  to  consecutive  hours  of  rainfall 
observations. 

Gabrial  and  Newman  showed  this  dependence 
by  estimating  the  conditional  probabilities  from 
successive  days  rainfall  observation  for  Tel- Aviv. 
Israel,  for  January. 
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pr[Wet  day/ previous  day  wet]  =0.674 
pr[Wet  day/ previous  day  dry]  =0.293 
pr[Dry  day/previous  day  wet]  =0.326 

and 

pr[Dry  day/previous  day  dry]  =  0.707. 

Wiser  (10)  has  also  shown  daily  dependencies 
for  North  Carolina  stations.  He  states  that  depen- 
dency is  found  to  be  a  quite  general  phenomenon. 
The  degree  of  dependence  is  less  for  months  than 
for  days,  is  less  for  wet  periods  than  for  dry  periods, 
and  at  some  locations  tends  to  a  condition  in  which 
information  about  only  the  previous  day  is  significant. 

To  check  the  dependence  of  one  day  rainfall  to 
successive  days  rainfall,  conditional  probabilities 
were  estimated  from  daily  rainfall  records  for  July, 
observed  at  Bithlo,  Fla.  The  estimated  conditional 
probabilities  are  as  follows: 

pr[Wet  day/previous  day  wet]  =  0.517 

pr[Wet  day/previous  day  dry]  =  0.327 

pr[Dry  day/previous  day  wet]  =0.483 

and 

pr  [Dry  day/previous  day  dry]  =0.673. 

From  the  above  calculated  conditional  probabili- 
ties estimates,  it  can  be  concluded  that  the  hourly 
and  the  daily  rainfall  process  possesses  the  prop- 
erties similar  to  that  of  the  Markov  process.  The 
Markov  process  property  states  that  the  probabil- 
ity that  a  system  will  be  in  a  given  state  at  a  given 
time,  t,  may  be  deduced  from  a  knowledge  of  its 
state  at  any  earlier  time,  fo,  and  does  not  depend  on 
the  history  of  the  system  before  ^o.  A  Markov 
process  with  discrete  parameter  is  called  a  Markov 
chain. 

An  A^th-order  Markov  chain  model  for  a  discrete 
stochastic  process  [Xt,  t  =  0,  1,  2,  .  .  .]  can  be 
written  mathematically  as  follows: 

pr[X,  =  xilXt-i  =  xt-i,  .  .  .,Zi  =  Xi] 

=  pr[X,  =  Xi/Xt-i^  x,-u  ■  ■  .,  Xt-.\  =  Xi-n]  (1) 

for  a\\Xi,X2,  .  .  .,  A,  and  t  =  N+l,  A+2,  .  .  .. 


A  first-order  Markov  chain  model  for  (A=l) 
is  written  as: 

pr[X,  =  XtlX(-i  Xi  =  xi] 

^pr[Xt  =  XtlXt-i  =  Xt-i].  (2) 

If        =  i  and  Xt  —  j ,  then  the  system  has  made  a  i 
transition  from  state  i  to  state  j  at  the  ah  step.  The 
probabilities  of  the  various  transitions  that  may  i 
occur  is  called  the  transitional  probability  and  is 
written  as: 

Pij  =  [Xt  =  XtlXt-i  =  xt-i\.  (3)  i 

The  transition  probabilities  are  estimated  from 
the  equivalent  frequencies  observed  from  the  his- 
toric data.  The  frequency  of  occurrence  is  obtained 
from  the  transition  of  processes  from  each  of  the 
states  during  tine,  t,  to  the  same  or  other  states  in 
time,  ^+  1.  The  frequencies  can  be  arranged  in  the 
form  of  table  1  in  which  fij  represents  the  frequency 
of  occurrence  of  transitions  between  (A=i)  and 
{X=i).  The  probability  Pij  is  estimated  as: 

Pij  =  fijlFi  (4) 

Fi=J^fij     for  1=1,  2,  .  .  .  r 

and ;=1,  2,  .  .  .  T  (5) 

Various  researchers  have  applied  the  Markov 
chain  models  for  rainfall  process  analysis.  Gabrial 
and  Newman  (6)  were  the  first  to  apply  the  Markov 
chain  model  to  determine  the  occurrence  or  non- 


Table  \.  — Transition  frequencies 


State  I 

State  J 

1 

2 

3  

T 

Fi 

1 
2 

/.. 

/.3  

/23  

flT 
flT 

F, 
F-, 

T 

fvi 

Ft 
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occurrence  of  rain  on  any  day.  They  reported  that 
the  first-order  Markov  chain  model  fitted  well  to 
frequency  deduced  from  the  daily  rainfall  observa- 
tions. They  were  not  concerned  with  synthesizinj^ 
the  rainfall  depth  values.  The  first  reported  research 
on  synthesizing  continuous  sequences  of  hourly 
rainfall  data  was  by  Pattison  (8).  Pattison  used  the 
first  and  the  sixth-order  Markov  chain  models.  The 
first-order  model  was  used  for  wet  periods,  and  the 
sixth-order  model  was  used  for  dry  periods.  Pattison 
states  that  a  first-order  Markov  chain  model  fails 
to  describe  the  transition  between  a  sequence  of 
wet  hours  and  a  sequence  of  dry  hours  because  the 
occurrence  of  an  hour  of  zero  rainfall  at  the  end  of 
wet  hour  sequence  is  considered  by  the  process  to 
be  most  likely  the  start  of  a  sequence  of  dry  hours, 
and  in  reality  it  is  not  so. 

The  problem  which  Pattison  had  with  the  in- 
between  sequence  while  synthesizing  the  hourly 
rainfall  values,  does  not  arise  with  the  daily  rainfall 
values.  Daily  rainfall  value  is  a  point  value,  and  there 
is  no  in-between  sequence.  No  reported  research 
was  available  on  synthesizing  daily  rainfall  values. 
Many  long  records  of  daily  rainfall  values  are 
available  in  comparison  to  hourly  records,  and  these 
provide  valuable  information  concerning  the  char- 
acteristics of  the  observation  sites.  The  objective 
of  this  work  is  to  set  up  a  stochastic  model  for  daily 
rainfall  data  synthesis.  The  synthesized  daily  rainfall 
data  should  duplicate  the  important  statistical 
properties  of  the  observed  daily  rainfall  data.  The 
synthesized  daily  rainfall  values  will  be  used  as 
input  to  the  watershed  systems  model  being  devel- 
oped in-house.  Daily  rainfall  data  from  Bithlo,  Fla., 
was  used  to  estimate  the  model  parameters  (transi- 
tion probabilities). 

Application  of  the  Markov  Chain 
Model  to  Synthesize  the  Daily 
Rainfall  Values  for  Bithlo 

Application  of  the  first-order  Markov  chain  mod«'l 
(equation  2)  was  made  for  the  daily  rainfall  data 
synthesis  for  Bithlo.  The  synthesis  procedure  is 
exemplified  by  the  detailed  <-alculation  of  iIk'  model 
parameter  estimation  and  the  steps  involved  in 
synthesizing  the  daily  rainfall  depth  values  for 
July.    Previously,   the    conditional   probability  of 


the  actual  day  being  wet  given  the  condition  of 
the  previous  day  (wet  or  dry)  for  July  was  esti- 
mated. No  rainfall  depth  assignments  were  made. 

Table  2  gives  10  years  of  daily  rainfall  values  ob- 
served at  Bithlo  for  July.  It  can  be  seen  from  the 
table  that  the  minimum  observed  daily  rainfall 
value  was  0.02  inch,  and  the  maximum  was  3. .52 
inches.  If  3.52  inches  is  taken  as  the  upper  limit 
for  the  daily  rainfall  values,  then  X(t),  t  —  0.00, 
0.01,  0.02,  .  .  .,  3..52  can  still  take  3.52  different 
values.  The  probability  of  getting  3.52  inches  of 
the  daily  rainfall  value  is  very  low.  To  reduce  the 
daily  rainfall  process  to  take  so  many  different 
values,  daily  rainfall  values  were  grouped  into  14 
intervals  as  shown  in  table  3.  This  will  reduce  the 
number  of  states  which  the  model  can  take  and  at 
the  same  time  eases  the  computational  scheme. 
These  14  states  now  constitute  the  states  of  the 
first-order  Markov  chain  model,  for  the  daily  rain- 
fall synthesis  procedure  for  Bithlo.  The  daily 
rainfall  process  in  terms  of  the  Markovian  states 
are  presented  in  table  4.  In  terms  of  the  first-order 
model,  the  process  can  pass  from  any  of  the  14 
states  from  the  previous  day  to  any  of  the  14  states 
on  the  actual  day.  In  other  words,  the  size  of  the 
transitional  probability  matrix  will  be  14  by  14. 

The  frequency  of  the  daily  rainfall  process  in 
terms  of  the  14  States  defined  previously  (equation  4) 
is  presented  in  table  5.  Table  5  displays  the  past 
behavior  of  the  daily  rainfall  process  for  Bithlo  for 
July.  From  the  table,  it  can  be  seen  that  the  process 
passes  120  times  from  State  1  on  the  previous  day 
to  State  1  on  the  actual  day.  It  passes  nine  times  to 
State  2,  and  so  on. 

An  estimate  of  the  transitional  probability  l  an  be 
made  from  the  frequency  table  by  use  of  equation  4. 
The  transitional  probability  estimates  are  presented 
in  table  6.  These  transitional  probabilities  are  in 
cumulative  form.  Probabililies  in  pdrentheses  are 
the  interpolated  probabilities  for  synthesis  futrposes. 

This  transitional  probability  matrix  provides  the 
computational  scheme  for  steppin^i  trom  raintall 
on  one  day  to  the  rainfall  one  da\  later.  Vssumption 
was  made  about  the  transition  probabilit>  beiu): 
stationary,  that  is.  independent  of  liiu«-  within  each 
month  of  [\\e  vear.  but  varvini;  froin  month  to  month. 

Oni'e  the  transitional  probabilities  are  estimated 
and  depth  of  the  ilail\  raintall  values  assijined  to 
each  of  the  14  States  as  detined  m  table  3.  the  pro- 
cedures lor  [\\v  svnthesis  of  the  dail>  rainfall  \dlut* 
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Table  2.— Daily  precipitation  from  Bithlo  for  July 


Days 


\... 
2... 
3... 
4... 
5... 
6... 
7... 
8... 
9... 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22. 
23. 
24. 
25. 
26. 
27. 
28. 
29. 
30. 
31. 


Years 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

0.44 

2.45 

0.24 

0.64 

0.62 

0.06 

.22 

1.30 

.90 

0.65 

.91 

0.20 

.57 

.32 

.70 

.89 

1.98 

0.30 

1.44 

.36 

2.00 

3.40 

.92 

1.82 

.70 

.52 

3.58 

.42 

.37 

.03 

.58 

.26 

1.22 

.28 

1.00 

.56 

.13 

.36 

.07 

.30 

.08 

.31 

1.22 

.65 

.05 

.36 

.20 

.33 

.72 

.47 

.72 

.15 

1.02 

.20 

.20 

.22 

.26 

1.05 

.26 

.37 

.36 

1.67 

.33 

1.44 

.49 

.23 

.85 

.31 

.05 

.70 

1.42 

2.22 

2.85 

.98 

.17 

.38 

.72 

.73 

3.52 

.56 

1.00 

.48 

.40 

2.82 

1.22 

.32 

.60 

.54 

.76 

1.70 

1.00 

.40 

1.00 

.60 

1.05 

1.48 

.02 

1.50 

1.90 

.22 

1.28 

.53 

.49 

.50 

.02 

.48 

.26 

.25 

.32 

.08 

.22 

.31 

.11 

.35 

.10 

.45 

.42 

.24 

.07 

.84 

.09 

made  by  use  of  the  Monte-Carlo  simulation  tech- 
nique are  as  follows. 

Monte-Carlo  simulation  technique,  that  is, 
random  sampHng,  was  used  to  generate  the  daily 
synthetic  rainfall  data.  The  procedure  was  pro- 
grammed for  IBM  1130.  A  flow  chart  is  listed  in 
the  appendix. 

1.  For  the  synthesis  of  the  rainfall  process  for 
day,  1,  determine  the  state  of  the  previous  day, 
t.  It  can  be  assumed  that  the  state  of  the  previous 
day,  t,  is  dry. 

2.  State  of  day,  t-\-\,  is  selected  at  random  by 
using  the  estimated  probabilities  that  determine 
the  transitions  from  a  dry  state  to  either  a  dry  or 
wet  state  in  day,  f+  1. 

3.  If  the  state  of  day,  t-\-\,  is  determined  to  be 
dry,  the  synthesis  moves  on  to  the  next  day. 


Table  3.  — Interval  grouping  of  daily  rainfall 


9... 
10. 
11 . 
12. 
13. 
14. 


Daily  rainfall  state 


Daily  rainfall 
interval 


Inches 

0 

0.01-0.10 
.11-  .20 
.21-  .30 
.31-  .40 
.41-  .50 
.51-  .75 
.76-1.00 
1.01-1.50 
1.51-2.00 
2.01-2.50 
2.51-3.00 
3.01-3.50 
3.51-4.00 


PROCEEDINGS  OF  THE  SYMPOSIUM  ON  STATISTICAL  HYDROLOGY 


201 


4.  If  the  state  of  day,  t  +  l,  is  determined  to  be 
wet,  then  the  magnitude  of  X(t+\}  is  selected 

sing  the  transition  probability  and  the  process  is 
terminated  for  day,  t+1. 

5.  If  the  state  of  day,  t,  is  found  to  be  wet  [X(t)  = 
2,  3  .  .  .  14],  the  state  of  day,  t+l,  is  selected  at 
random  using  estimates  of  the  probabilities.  After 
the  selection  of  A'(/+l),  the  procedure  moves  on 
to  the  next  day. 

6.  The  state  of  the  rainfall  system  for  each  day 
has  to  be  transformed  into  a  rainfall  amount  in 
inches.  Midpoint  values  of  the  rainfall  intervals 
listed  in  table  3  were  used  as  the  rainfall  amounts 
for  each  state. 

7.  Different  probability  estimates  of  daily  rain- 
fall were  used  for  different  months  to  take  into 
account  the  seasonal  variability. 

8.  Repeat  steps  1  through  7  for  as  many  years  of 
synthesized  daily  rainfall  values  as  desired. 


Results  and  Discussion 

Twenty  years  of  daily  rainfall  values  were  syn- 
thesized for  Bithlo  by  use  of  the  first-order  Markov 
chain  model.  Monthly  historic  values  and  the 
synthesized  values  are  presented  in  tables  7  and 
8.  To  check  the  adequacy  of  the  first-order  Markov 
chain  model  to  represent  the  daily  rainfall  process 
for  Bithlo,  the  Kolmogorov-Smirnov  two-sample  test 
was  used  for  July.  This  test  is  used  to  test  whether 
the  two  samples,  that  is,  the  samples  from  the 
historic  and  synthesized  data,  have  been  drawn 
from  the  same  population.  If  they  are  drawn  from 
the  same  population,  then  their  cumulative  fre- 
quency distributions  should  show  only  random 
deviations  from  the  distribution  of  the  population. 

To  apply  the  test,  cumulative  frequencies  were 
derived  from  the  historic  and  the  synthesized  states. 
An  oc  value  of  0.01  level  of  significance  was  used  for 
the  test.  The  computed  cumulative  frequencies  from 
the  historic  and  synthesized  states  are  presented  in 


Table  'i.— Daily  rainfall  conversion  to  daily  rainfall  state 


uays 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

0  

6 

1  

11 

4 

2  

3  

7 

7 

2 

8 

4  

4 

9 

7 

5  

8 

3 

7 

6  

5 

7 

8 

7  

10 

4 

9 

5 

10 

8  

13 

8 

9  

10 

7 

7 

14 

6 

10  

5 

2 

7 

4 

9 

4 

11  

9 

7 

3 

5 

12  

2 

4 

13  

2 

5 

9 

7 

14  

2 

5 

3 

15  

5 

7 

6 

7 

3 

16  

9 

3 

3 

4 

4 

9 

17  

4 

5 

5 

7 

18  

5 

9 

6 

4 

8 

5 

19  

2 

7 

9 

11 

12 

20  

8 

3 

21  

5 

7 

7 

14 

22  

7 

9 

6 

5 

12 

9 

5 

23  

7 

7 

24  

8 

10 

9 

5 

25  

9 

7 

9 

9 

26  

2 

9 

10 

4 

9 

27  

7 

6 

6 

28  

2 

6 

4 

4 

5 

2 

4 

29  

5 

3 

5 

30  

2 

6 

6 

4 

2 

8 

31  

2 
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table  9  and  plotted  in  figure  1.  The  largest  abso- 
lute diflference  between  the  two  distributions  is 
the  test  statistic,  D. 

D=MaxlSh{X)-SAX)l 

where  Sh{X)  and  Ss{X)  are  the  cumulative  fre- 
quency distributions  for  the  historic  states  and  the 
synthesized  states. 


The  tabulated  value  of  Dcr  at  0.01  level  of  sig 
nificance  (4)  is  0.094.  As  the  calculated  D  value  il 
lower  than  the  tabulated  Do  value,  it  can  be  saiJ 
that  the  first-order  Markov  model  adequately  rep| 
resents  the  daily  rainfall  process  for  Bithlo. 

The  storm  length  from  the  synthesized  and  the 
historic  daily  rainfall  values  were  also  subjected  ta 
the  Kolmogorov-Smirnov  two-Smirnov  two-sample 
test.  The  tabulated  Dcr  value  at  0.01  level  of  sigl 


Table  5.— Frequency  of  daily  rainfall  process  for  July  at  Bithlo 


State  during  day  (7'+  1) 


:5iaie  aunng  aay,  i 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

XFi 

1 

120 

9 

3 

8 

8 

3 

14 

4 

9 

1 

0 

1 

0 

1 

181 

2 

7 

1 

0 

0 

0 

1 

0 

1 

1 

0 

0 

0 

0 

0 

11 

3 

3 

0 

0 

0 

2 

1 

1 

0 

0 

0 

0 

0 

0 

0 

7 

4 

11 

0 

1 

0 

2 

0 

0 

0 

0 

0 

1 

0 

0 

0 

15 

5 

8 

0 

0 

1 

2 

1 

1 

1 

1 

0 

0 

1 

0 

0 

16 

6 

4 

0 

0 

2 

3 

0 

0 

0 

1 

0 

0 

0 

0 

0 

10 

7 

6 

0 

1 

2 

0 

1 

1 

1 

0 

3 

0 

0 

0 

0 

15 

8 

5 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

8 

9 

4 

1 

2 

2 

0 

0 

4 

1 

2 

0 

0 

0 

0 

0 

16 

10 

1 

1 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

1 

0 

5 

11 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

12 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

13 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

14 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

2 

Table  6.— Estimates  of  cumulative  transition  probability  for  the  daily  rainfall  process 

during  July  at  Bithlo 


[Probabilities  in  parentheses  are  the  interpolated  probabilities  for  synthesis  purposes.] 


State 

State  during  day  (7'+  1) 

during 

day,  T 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

1  

0.680 

0.731 

0.743 

0.783 

0.829 

0.847 

0.927 

0.945 

0.985 

0.990 

(0.992) 

0.995 

(0.998) 

1.000 

2  

.636 

.727 

(.748) 

(.769) 

(.790) 

.818 

(.862) 

.909 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

3  

.428 

(.499) 

(.570) 

(.643) 

.714 

.857 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

4  

.714 

(.749) 

.785 

(.877) 

.929 

(.941) 

(.953) 

(.%5) 

(.977) 

(.989) 

1.000 

1.000 

1.000 

1.000 

5  

.500 

(.521) 

(.541) 

.562 

.690 

.752 

.814 

.876 

.938 

(.959) 

(.979) 

1.000 

1.000 

1.000 

6  

.400, 

(.465) 

(.530) 

.600 

.900 

(.925) 

(.950) 

(.975) 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

7  

.400 

(.433) 

.467 

.599 

(.632) 

.666 

.733 

.800 

(.900) 

1.000 

1.000 

1.000 

1.000 

1.000 

8  

.625 

.750 

(.791) 

(.835) 

.875 

(.889) 

(.903) 

(.917) 

(.931) 

(.945) 

(.959) 

(.973) 

(.987) 

1.000 

9  

.235 

.294 

.411 

.528 

(.607) 

(.686) 

.766 

.883 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

10  

.200 

.400 

(.450) 

(.500) 

(.550) 

(.600) 

(.665) 

(.730) 

.800 

(.850) 

(.900) 

(9.50) 

1.000 

1.000 

11  

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

12  

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

13  

(.100) 

(.200) 

(.300) 

(.400) 

(.500) 

(.600) 

(.700) 

(.800) 

(.900) 

1.000 

1.000 

1.000 

1.000 

1.000 

14  

(.100) 

(.200) 

(.300) 

(.400) 

(..500) 

(.600) 

(.700) 

(.800) 

(.900) 

1.000 

1.000 

1.000 

1.000 

1.000 
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COMPARISON  OF  HISTORICAL  AND  SYNTHESIZED  STATES 
FOR  THE  MONTH  OF  JULY  FOR  BITHLO 


1.000 


0.800  - 


0.600 


SYNTHESIZED  STATES  FROM  THE  MODEL 


HISTORIC  STATES 


>■ 
z 


0.400 


0.200 


6         7  i 

Fuil  RE  L- Stales. 


10 


11 


12 


13 


14 


nificance  is  0.163  (sample  size  100),  and  the  maxi- 
mum absolute  computed  value  is  0.120.  It  can  be 
said  that  the  storm  lengths  are  also  significant  at 
the  1-percent  level. 

An  X'^  test  was  used  to  test  the  frequencies 
derived  for  the  number  of  wet  days,  from  the  historic 
and  synthesized  daily  rainfall  values.  Calculated 
X'^  values  together  with  the  table  value  for  0.01 
level  of  significance  is  presented  in  table  10. 

All  of  the  above  tests,  (Kolgomolov-Smirnov  and 
A  - ) ,  indicate  that  the  first-order  Markov  chain 
model  is  adequate  for  daily  rainfall  synthesis  pro- 
cedure. However.  Franz  (.5)  states  that  the  applica- 
tion of  statistical  tests  is  hazardous  because  the 
assumption  of  random  sampling  is  often  violated. 
He  states  further  that  personal  judgment  based  on 


experience  and  tempered  by  rough  statistical  cal- 
culations should  be  given  more  weight  than  the  so- 
called  "precise  and  powerful"  normal  theory  tests. 

Comparison  of  the  monthly  means,  maximum  and 
minimum  values,  and  the  average  number  of  wet 
days  have  also  been  made  from  the  historic  and  syn- 
thetic data.  These  values  are  presented  in  tables 
11  and  12. 

From  table  11  and  figure  2.  it  can  be  seen  that  the 
historic  and  the  synthesized  means  match  fairly 
well,  except  for  September.  The  difference  between 
the  two  means  for  this  month  is  more  than  3.5 
inches.  In  general,  the  synthesized  mean  monthly 
values  are  lower  than  the  historic  means. 

Comparison  of  the  historic  and  synthesized  maxi- 
mum and  minimum  values  are  also  presented  in 
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table  11.  For  March,  July,  October,  and  November, 
historical  maximum  values  are  higher  than  the  syn- 
thesized maximum  values.  For  February,  May,  June, 
and  December,  the  synthesized  maximum  values 
are  higher  than  the  historic  values,  perhaps  because 
synthesized  values  are  the  midpoint  values,  which 
are  almost  static.  Maybe,  rather  than  assigning 
the  midpoint  values  as  such,  if  a  random  component 
is  added  to  it,  then  the  assignment  of  rainfall 
depth  values  will  be  more  flexible  and  the  dif- 
ferences between  the  historic  and  the  synthesized 
values  will  be  minimized. 

The  historical  and  synthesized  minimum  values 
match  fairly  well  except  for  March,  June,  July, 
August,  and  September. 

Comparison  of  the  average  number  of  wet  days 
in  a  month  is  presented  in  table  12  and  figure  3. 
June,  July,  August,  and  September  are  the  rainy 
months  in  Florida,  and  for  these  months  the  match 
between  the  historical  and  synthesized  number  of 
wet  days  matches  fairly  well.  For  the  rest  of  the 
months,  there  are  some  discrepancies. 


Concluding  Remarks 

Several  models  are  under  study  by  the  Central 
and  Southern  Florida  Flood  Control  District  in  a 
continuing  search  for  optimal  management  and 
effective  control  of  water  resource  systems.  These 
models  comprise  the  (a)  watershed  systems  model, 
(b)  economic  model,  and  (c)  rainfall  model  (fig.  4). 

The  rainfall  model  discussed  in  this  paper  was 
developed  with  the  intention  of  providing  synthetic 
input  data  to  the  watershed  systems  model.  So 
far,  the  actual  application  of  this  model  has  not  been 
made. 
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Table  1.  —  20  years  of  synthesized  daily  rainfall  values  summed  together  for  months 


Run  No. 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1  

2.8 

4.7 

4.0 

1.3 

2.0 

6.5 

6.2 

5.5 

4.5 

2.5 

0.7 

2.0 

2  

3.0 

6.7 

7.1 

1.2 

2.0 

18.0 

11.4 

12.4 

10.4 

4.9 

1.5 

1.8 

3  

5.3 

7.1 

7.5 

3.0 

2.9 

10.7 

7.6 

8.3 

8.0 

4.0 

2.2 

5.1 

4  

.2 

1.5 

1.7 

0 

.2 

5.7 

5.5 

3.7 

12.5 

1.4 

0 

.1 

5  

.2 

1.2 

.2 

.9 

.1 

6.7 

4.0 

12.2 

11.5 

.3 

0 

.1 

6  

1.7 

3.7 

2.7 

1.9 

7.7 

4.7 

7.5 

7.0 

5.3 

2.7 

1.6 

1.1 

7  

1.5 

2.9 

3.2 

1.2 

2.7 

3.7 

5.8 

15.1 

4.4 

1.8 

.7 

2.1 

8  

2.3 

5.8 

2.7 

1.3 

1.3 

9.0 

10.2 

8.8 

5.5 

2.5 

1.5 

1.7 

9  

2.0 

4.6 

3.7 

1.7 

9.7 

9.5 

8.6 

7.1 

4.8 

2.6 

1.5 

1.1 

10  

3.1 

6.0 

6.0 

.8 

2.5 

12.2 

10.9 

12.1 

9.9 

5.0 

1.8 

2.9 

11  

1.8 

4.0 

2.4 

1.0 

7.0 

7.0 

6.7 

6.3 

5.1 

2.7 

.7 

2.3 

12  

1.5 

2.4 

3.0 

1.2 

7.5 

5.2 

6.3 

4.3 

5.1 

2.0 

1.6 

.8 

13  

1.5 

2.7 

2.8 

.1 

.9 

5.2 

7.4 

6.3 

13.4 

1.6 

.7 

.8 

14  

.5 

2.2 

1.5 

1.2 

.2 

4.5 

4.7 

13.5 

3.2 

1.3 

.7 

.2 

15  

3.5 

4.7 

3.5 

1.6 

1.6 

9.7 

7.3 

7.3 

4.2 

3.0 

.7 

2.2 

16  

3.8 

5.5 

4.5 

.1 

7.7 

6.7 

10.1 

8.3 

14.9 

3.3 

1.5 

2.3 

17  

.7 

2.5 

2.2 

1.7 

.3 

9.2 

8.5 

4.8 

5.1 

1.4 

1.5 

.2 

18  

3.1 

7.2 

4.0 

1.5 

6.6 

9.7 

10.2 

11.8 

14.8 

3.5 

.7 

2.9 

19  

2.0 

5.5 

2.3 

1.5 

.8 

17.0 

9.4 

7.3 

9.0 

1.6 

2.2 

2.0 

20  

1.2 

2.5 

3.8 

.3 

.8 

4.7 

6.3 

15.0 

5.0 

1.1 

.7 

.4 
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Table  8. —  70  years  of  historic  daily  rainfall  summed  together  for  months 


Year 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

]   

3.65 

4.18 

8.00 

4.07 

3.03 

9.60 

5.92 

5.47 

8.50 

5.52 

1.20 

2.60 

2  

LOS 

4.22 

13.23 

1.19 

1.55 

11.34 

19.87 

5.68 

12.10 

2.85 

.25 

.68 

3  

2.02 

2.25 

1.73 

.25 

2.23 

5.99 

4.37 

11.65 

13.00 

1.40 

1.26 

1.43 

4  

L23 

2.43 

3.06 

1.60 

.36 

3.40 

6.51 

12.01 

8.28 

1.51 

2.42 

1.38 

5  

2.22 

3.61 

4.42 

.76 

7.30 

10.77 

2.15 

9.03 

9.77 

.24 

9.06 

2.44 

D.  10 

o.lZ 

z.ou 

Z.oU 

Q  77 
O.  1  1 

7  no 

14. UU 

•37 

oo 
.Vo 

7  

1.68 

4.34 

3.62 

.74 

.09 

4.05 

12.56 

10.61 

4^63 

4.10 

0 

3.92 

8  

4.47 

5.90 

2.45 

1.74 

5.66 

10.01 

11.67 

3.42 

15.02 

1.70 

.44 

.53 

9  

1.51 

4.88 

1.08 

0 

.94 

11.60 

8.42 

10.06 

6.86 

0 

0 

2.86 

10  

.70 

7.77 

2.29 

1.50 

6.07 

16.98 

9.62 

8.91 

2.80 

9.25 

2.56 

.55 

Table  9.  — Cumulative  frequency  of  the  historic  and 
the  synthesized  daily  rainfall  states  for  July, 
for  Bithlo,  Fla. 


State 


1  

0.611 

0.601 

2  

.649 

.652 

3  

.675 

.674 

4  

.724 

.723 

5  

.783 

.794 

6  

.814 

.849 

7  

.882 

.930 

8  

.908 

.952 

9  

.%3 

.979 

10  

.979 

.982 

11  

.985 

.985 

12  

.991 

.997 

13  

.994 

1.000 

14  

1.000 

1.000 

Historic 
cumulative 
frequency 


Synthesized 
cumulative 
frequency 


Table  10. —  X^  test  statistics  for  the  frequencies  of 
the  number  of  wet  days 


Month 

calculated 

x'^  0.99  table 

values 

value 

0.3020 

6.63 

February  

.0080 

.0430 

April  

.1560 

.3430 

June  

.0250 

July  

.0008 

.0770 

.0560 

October  

.2430 

November  

.1630 

December  

.0140 

Note:  D  =  /0.882  -  0.930/  =  0.048 


Table  11.— Statistical  properties  comparison  between  synthesized  and  observed  rainfall 


Mean 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Historic  

2.37 

4.28 

4.24 

1.46 

3.10 

9.26 

8.82 

9.08 

8.80 

2.90 

1.75 

1.68 

Synthesized  

2.08 

4.17 

3.44 

1.18 

2.92 

8.28 

7.73 

8.85 

5.37 

2.46 

1.12 

1.60 

H.  max.  value  

5.16 

5.90 

13.23 

4.07 

7.30 

16.98 

19.87 

14.00 

15.02 

9.25 

9.06 

3.92 

S.  max.  value  

5.03 

7.10 

7.50 

3.00 

9.70 

18.(X) 

11.40 

15.10 

14.90 

5.00 

2.20 

5.10 

H.  min  

.70 

2.25 

1.08 

.25 

.09 

3.40 

2.15 

3.42 

2.80 

.24 

.25 

.53 

Synthesized  min  

.20 

1.20 

.02 

.10 

.10 

4.50 

4.00 

2.20 

1.50 

.30 

0 

.10 

Table  12.— Comparison  of  average  number  of  wet  days  in  a  month 


Mean 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Historic  wet  

4.0 

4.8 

5.8 

2.3 

5.7 

9.7 

12.1 

10.4 

8.7 

4.1 

2.2 

2.8 

Synthesized  dry  

2.9 

4.6 

5.3 

1.7 

4.3 

9.2 

12.2 

9.5 

8.0 

5.1 

1.6 

2.6 
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Figure  2. -Months. 
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FKiURE  3. -Months 
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APPENDIX 


FLOW  CHART  FOR  SYNTHESIS  PROCEDURE 


READ  THE  TRANSITIONAL 
MATRIX,  TRAN  (I, J)  AND 
THE  PRE-ASSIGNED  DAILY 
RAINFALL  VALUES. 


Call  Random  Number  Generator 
for  the  cum.  freq.  to  go  from 
State  I  on  Day,  t,  to 
State  j  on  Day,  t+1 


DO  J  =  1,  Number  of  Transitional 
States 


INITIALIZE  SPACE  FOR  AS 
MANY  YEARS  OF  MONTHLY  TOTAL 
DATA  AS  DESIRED 


IF  [State  J  -  Trans(IJ)] 


DO  L  =  1,  N 

Where  N  =  Number  of  Years 
of  a  particular 
months  data  desired. 


RFALL  (K,L)  =  RJ(J) 

RTOT  (L)  =  RTOT  +  RFALL  (K.L) 


0 


0 


Call  random  number  generator 
STAT  I,  Scale  it  to  the  number 
of  states  on  day,  t 


Interchange  J  to  I 


Output 
[RFALL  (K.L),  K-1 ,  LM] 
RTOT  (L),  L  =  1.  N 


DO  K  =  1 ,  LM 

Where  LM  =  The  number  of  days 

in  a  particular  month/ 


Figure  4.  —  Schematic  operational  watershed  model 
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STOCHASTIC  MODELS  OF  SPATIAL  AND  TEMPORAL  DISTRIBUTION  OF 

THUNDERSTORM  RAINFALL  i 


By  H.  B.  Osborn,  L.  J.  Lane,  and  R.  S.  Kagan  ^ 


Abstract 

A  simplified  stochastic  model  based  on  airmass 
thunderstorm  rainfall  data  from  the  58-square-miIe 
Walnut  Gulch  Experimental  Watershed  in  south- 
eastern Arizona  is  being  developed  at  the  Southwest 
Watershed  Research  Center,  Tucson,  Ariz.  Records 
from  the  95  rain  gage  network  on  this  watershed  pro- 
vide valuable  information  on  airmass  thunderstorm 
rainfall  in  the  Southwestern  United  States.  Proba- 
bility distributions  are  being  used  to  model  random 
variables  — number  of  cells,  spatial  distribution  of 
the  cells,  and  cell  center  depths  — of  thunderstorms 
in  a  summer  rainy  season.  A  computer  program 
produces  synthetic  thunderstorm  rainfall  based  on 
these  distributional  assumptions.  The  synthetic  data 
are  compared,  with  respect  to  storm  center  depths 
and  isohyetal  map  characteristics,  with  data  from 
the  dense  rain  gage  network  on  Walnut  Gulch. 

The  daily  and  hourly  chances  of  occurrence  of 
seasonal  airmass  thunderstorm  rainfall  are  modeled. 
Efforts  are  being  made  to  model  the  temporal  dis- 
tribution of  rainfall  from  individual  cells  within  the 
airmass  thunderstorm. 

Finally,  the  question  of  model  transferability 
to  other  regions  and  locations  is  tied  to  defining 
regional  meteorology  and  local  topography. 

Introduction 

Chow  (7)  and  others  have  defined  and  differ- 
entiated among  deterministic,  stochastic,  and  prob- 
abilistic processes  and  models  in  hydrology,  pointing 
out  that  stochastic  processes  follow  probabilistic 
laws  and  are  time  dependent,  whereas  purely  prob- 


'  Contribution  of  the  Agricultural  Research  Service.  USDA,  in 
cooperation  with  the  Arizona  Agricultural  Experiment  Station, 
Tucson,  Ariz. 
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abilistic  models  are  time  independent.  In  short, 
stochastic  modeling  in  hydrology  is  the  sequential 
generation  of  hydrologic  information  considered 
wholly  or  partly  random  in  nature.  In  this  paper, 
stochastic  models  of  the  spatial  and  temporal 
distribution  of  thunderstorm  rainfall  are  considered, 
and  an  example  based  on  airmass  thunderstorm 
rainfall  is  formulated.  The  authors  are  more  familiar 
with  thunderstorms  of  the  Southwestern  United 
States  than  for  any  other  region,  so  the  discussion 
and  analyses  will  be  based  on  southwestern 
thunderstorms. 

Thunderstorms 

Thunderstorms  are  an  important  source  of  rain- 
fall in  the  Southwest.  Because  of  the  extreme 
variability  in  thunderstorm  rainfall  both  in  time  and 
space  and  the  difficulty  in  measuring  this  variabil- 
ity, most  publications  on  the  subject  have  been  more 
qualitative  than  quantitative.  Among  the  publica- 
tions of  interest  are  those  by  MacDonald  (9,  70), 
Sellers  {21).  Woolhiser  and  Schwalen  (22),  Osborn 
and  Reynolds  [IS).  Osborn  (74,  7.5).  Osborn  and 
Hickok  (76),  Drissel  and  Osborn  (3),  Fogel  (5), 
and  Fogel  and  Duckstein  (6).  The  last  nine  of  these 
publications  also  contain  attempts  at  quantifying 
thunderstorm  rainfall  as  well  as  containing  quali- 
tative description. 

Petterssen  (79)  made  the  following  distinction 
between  thunderstorm  types. 

Outside  the  intertropical  belt,  thunderstorms  are  observed 
to  occur  in  three  easily  recognized  patterns.  (1)  When  an  air  mass 
is  convectively  unstable,  sufficiently  warm  and  moist,  thunder- 
storms will  be  released  in  the  upglide  motion  associated  with 
frontal  zones.  Although  the  storms  may  be  widelv  scattered,  the 
general  pattern  moves  along  with  the  fronts  with  which  they  are 
associated.  They  are  usually  referred  to  as  frontal  thunderstorms. 
(2)  Within  more  or  less  uniform  air  masses  one  finds  an  irregular 
pattern  of  individual  storms,  or  clusters  of  such  storms.  These, 
which  are  usually  referred  to  as  ttir  muss  ihunJerslorms.  show 
a  pronounced  diurnal  variation  with  a  maximum  in  the  aftern»HMi 
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or  early  evening.  (3)  Analyses  of  radar  scopes  show  that  thunder- 
storms not  associated  with  fronts  often  have  a  tendency  to  be 
arranged  in  lines  or  bands  more  or  less  along  the  direction  of  the 
wind  at  low  levels.  These  are  called  line  thunderstorms. 

Unfortunately,  the  delineation  between  thunder- 
storm types  is  not  always  so  easily  recognized.  Re- 
gions subject  to  significant  numbers  of  airmass 
thunderstorms,  such  as  the  Southwestern  United 
States,  also  may  be  subject  to  varying  degrees  of 
frontal  activity,  and  sometimes  these  fronts  are 
difficult  to  detect.  Also,  different  types  of  thunder- 
storms may  be  dominant  in  different  parts  of  a 
region,  and  frontal  activity  may  vary  within  a  region. 

For  example,  on  the  arid  and  semiarid  rangelands 
of  southeastern  Arizona  and  southwestern  New 
Mexico  (as  well  as  many  other  regions),  airmass 
thunderstorms  produce  well  over  one-half  of  the 
average  annual  precipitation  and  almost  aU  of  the 
annual  surface  runoff.  In  other  arid  and  semiarid 
regions  of  the  Southwest,  frontal  activity  is  more 
common,  and  either  frontal,  airmass  thunderstorms, 
or  both,  are  common.  At  higher  elevations  in  the 
Southwest,  winter  rain  and  snow  are  more  important 
sources  of  water  yield  to  the  valleys  below. 

The  Southwest  is  a  region  where  fronts  tend  to 
dissipate  or  disappear  from  weather  maps;  yet  they 
may  still  influence  thunderstorm  activity.  Thunder- 
storm buildup  will  vary  with  the  amount  and  dis- 
tribution of  moist  air  aloft,  temperatures  at  various 
levels,  and  the  winds  aloft.  In  southern  Arizona,  for 
example,  airmass  thunderstorms  result  from  a 
combination  of  convective  heating  and  moist  air 
moving  into  the  region  from  the  south,  generally 
from  the  Gulf  of  Mexico,  but  occasionally  from  the 
Pacific  Ocean.  Moist  air  from  the  Gulf  of  Mexico 
usually  is  drier  than  that  from  the  Pacific  (because 
of  distance  traveled  and  mountains  crossed),  and 
when  thunderstorm  activity  on  a  given  day  or  during 
a  few  consecutive  days  is  more  prolonged  and  the 
thunderstorms  are  more  closely  spaced  than  usual, 
the  source  of  moisture  is  usually  the  Pacific  Ocean. 
However,  there  are  "in-between"  regions  where, 
without  meteorological  information,  one  cannot 
guess  the  origin  of  the  moisture.  Also,  if  atmospheric 
conditions  are  such  that  the  flow  of  moist  air  from 
the  Gulf  of  Mexico  continues  uninterrupted  for  a 
long  enough  period,  thunderstorm  activity  may  be 
similar  to  that  which  occurs  when  moist  air  moves 


into  the  region  from  the  Pacific.  In  southeastern 
Arizona,  almost  all  runoff-producing  rainfall  on 
watersheds  of  100  square  miles  or  less  appears,  at 
least  from  analysis  of  recording  rain  gage  records, 
to  result  from  airmass  thunderstorms. 

Sellers  {21 )  described  occasional  September 
storms  as  "rampaging"  across  southern  Arizona. 
These  storms  develop  as  warm,  moist  air  is  pushed 
into  southern  Arizona  from  the  Pacific  by  tropical 
storms.  A  combination  of  one  or  more  of  three  con- 
ditions—orographic  Hfting,  convective  heating,  and 
colder  air  pushing  from  the  north  under  the  advanc- 
ing warm,  moist  air  — produces  more  general  rains 
with  thunderstorm  activity  throughout  the  period, 
rather  than  just  in  the  afternoon  and  evening  hours. 
In  reality,  these  storms  probably  should  be  a  sub- 
class under  frontal  thunderstorms  because  convec- 
tive heating  is  an  important  part  of  much  of  the 
thunderstorm  activity  within  the  overall  storm 
period.  Possibly,  they  should  be  classified  as  frontal- 
convective  thunderstorms. 

In  general,  the  occurrence  of  a  thunderstorm  at 
a  particular  point  or  over  a  particular  small  area 
within  a  chmatic  region  appears  purely  random,  and 
the  depth  and  intensity  of  rainfall  and  the  area 
covered  by  varying  depths  and  intensities  of  rain- 
fall appear,  within  Hmits,  to  be  random.  Therefore, 
thunderstorm  rainfall  appears  to  fit  very  neatly  the 
definition  of  a  stochastic  process  in  hydrology.  How- 
ever, there  should  be  considerable  latitude  in  the 
assumptions  and  mathematical  representations  of 
such  thunderstorms  depending  upon  the  amount 
and  accuracy  of  available  information  and  the  pro- 
posed use  of  the  model. 

Stochastic  Thunderstorm  Models 

Storm  systems  producing  thunderstorms  are 
difficult  to  classify  without  simplification;  yet 
simplification  is  necessary  both  in  definition  and 
classification  of  the  systems  and  in  the  eventual 
modeling  of  the  systems.  Rosenblueth  and  Wiener 
(20)  stated: 

No  substantial  part  of  the  universe  is  so  simple  that  it  can  be 
grasped  and  controlled  without  abstraction.  Abstraction  consists 
in  replacing  the  part  of  the  universe  under  consideration  by  a 
model  of  similar  but  simpler  structure.  Models,  formal  or  intel- 
lectual on  one  hand,  or  material  on  the  other,  are  thus  a  central 
necessity  of  scientific  procedure. 
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In  general,  stochastic  thunderstorm  rainfall 
models  are  either  physically  based,  data  based,  or 
both.  Ideally,  models  based  on  atmospheric  and 
topographic  conditions  might  be  preferred,  but 
realistically,  most  models  are  based  on  data  col- 
lected at  the  ground  surface  and  are  developed 
without  atmospheric  parameters.  Since  thunder- 
storm rainfall  is  highly  variable  both  spatially  and 
temporally  in  space  and  this  variability  is  difficult 
to  measure,  any  great  degree  of  sophistication  of 
thunderstorm  rainfall  models  not  based  on  atmos- 
pheric data  may  be  suspect.  This  is  particularly 
true  if  the  end  result  is  to  predict  runoff,  where 
uncertainties  in  watershed  response  add  to  the 
uncertainty  of  the  output  and  may  Umit  runoff 
models  to  rather  simple  inputs  and  "black  box" 
techniques. 

LeCam  (8)  developed  a  theoretical  model  for 


rainfall  as  a  random  phenomenon  incorporating 
yearly  periodicity.  The  model  was  described  as  a 
clustering  process  of  the  type  presented  by  Neyman 
and  Scott  il2,  13).  LeCam's  lucid  description  of 
rainfall  occurrence  and  his  comments  on  validating 
or  testing  such  models  are  especially  relevant  as 
the  complexity  of  models  increases. 


Airmass  Thunderstorm  Rainfall 
Model 

As  an  example,  a  simplified  stochastic  model 
incorporating  the  spatial  and  temporal  distribution 
of  thunderstorm  rainfall  was  developed  from  rain 
gage  records  of  airmass  thunderstorm  rainfall  on 
the  Walnut  Gulch  Experimental  Watershed.  Tomb- 
stone, Ariz.  (fig.   1).  The  Southwest  Watershed 


Figure  l.-The  Walnut  Gulch  Watershed. 
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Research  Center  of  the  Agricuhural  Research 
Service  operates  this  58-square-mile  experimental 
rangeland  watershed.  The  watershed  is  representa- 
tive of  semiarid  rangelands  throughout  much  of  the 
Southwest.  Of  the  95  recording  rain  gages  on  the 
watershed,  80  have  been  in  continuous  operation 
for  over  10  years.  The  means  and  ranges  of  the 
variables  used  in  this  model  were  determined  from 
the  records  from  Walnut  Gulch.  Total  storm  rain- 
fall for  eight  selected  events  is  shown  to  illustrate 
the  variability  of  thunderstorm  rainfall  and  to 
indicate  visually  the  difficulties  in  modeling  such 
rainfaU  (fig.  2(A-H)). 

A  stochastic  model  of  thunderstorm  rainfall  for 
Walnut  Gulch  is  being  developed  in  three  parts. 
The  first  part,  or  routine,  in  the  model  determines 
the  chance  of  daily  and  hourly  occurrence  of  a 
significant  event.  Included  in  this  routine  is  the 
chance  of  more  than  one  event  occurring  on  the 
same  day.  The  second  part  of  the  model  generates 
the  total  storm  rainfaU  through  addition  of  in- 
dividual synthetic  storm  cells  regardless  of  time 
of  occurrence  within  the  storm.  Significant  progress 
has  been  made  on  the  first  two  parts  of  the  model. 

The  final  part  of  the  model  involves  generating 
the  cells  sequentially  and  continuously,  possibly 


describing  the  storm  with  a  series  of  isohyetal 
maps  of  short  duration  (possibly  10  minutes). 
Development  of  the  third  part  of  the  model  wiU 
continue  after  possible  modifications  and  final 
verification  of  the  first  two  parts. 

Occurrence  of  an  Airmass  Thunderstorm 
Event 

An  initial  attempt  at  modeUng  the  probability  of 
a  thunderstorm  occurring  during  the  summer  rainy 
season  involved  assuming  a  probability  distribution 
for  the  start  of  the  rainy  season.  Once  the  season  had 
started,  the  occurrence  or  nonoccurrence  of  an 
event  was  modeled  as  a  BernoulU  variable  with  con- 
stant parameter  throughout  the  season.  However, 
considering  the  assumptions  about  moist  air  move- 
ment stated  in  the  previous  section,  the  assumption 
about  constant  probability  of  occurrence  (Ber- 
noulli parameter)  throughout  the  season  was  not 
consistant.  Analysis  of  rainfall  data  from  the  Walnut 
Gulch  Experimental  Watershed  indicates  that  the 
probability  of  storm  occurrence  varies  considerably 
within  the  rainy  season. 

Therefore,  a  variable  probability  of  occurrence  of 
significant  thunderstorm  rainfall  based  on  10  years 


Figure  2. -Isohyetal  maps  of  selected  thunderstorm  rainfall.  Walnut  Gulch  watershed  precipitation  (in  inches):  A.  Storm  of  August  12, 
1963  (1200).  B,  Storm  of  August  16, 1963  (1640).  C,  Storm  of  July  13. 1964  (1600).  D,  Storm  of  September  11, 1964  (1700).  E,  Storm 
of  July  29,  1966  (1830).  F.  Storm  of  July  7,  1967  (1500).  G.  Storm  of  August  3,  1967  (1700).  H,  Storm  of  August  13,  1967(1400). 
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of  precipitation  (lata  from  Walnut  Gulch  is  used  to 
estimate  the  probability  of  a  sifinificant  storm  oc- 
curring somewhere  over  the  58-s(iuare-mile  water- 
shed. A  significant  storm  is  specified  as  one  wilh  at 
least  0.25  inch  of  rainfall  recorded  on  at  least  two 
adjacent  rain  gages.  Eflects  of  inodehng  tlie  seasonal 
distribution  of  daily  rainfall  by  incorporating  a 
varying  Bernoulli  parameter  are  discussed  in  Ap- 
pendix I.  Figure  3  shows  the  5-day  running  mean 
for  the  number  of  significant  storms  recorded 
(1960-69)  on  the  Walnut  Gulch  watershed.  The 
smoother  curve  shown  in  figure  3  is  arbitrariU 


adopted  in  the  moilel.  The  curve  is  similar  in  shape 
to  the  point  frequency  value  from  long-term  I  .S. 
Weather  Bureau  records  from  Tombstone,  Ariz. 

\dditional  work  is  in  progress  to  facilitate  extrap- 
olating [)oint  frequencies  fn)in  X^eather  Bureau 
and  other  data  to  pri>vide  storm  trequcncies  lor 
finite-sized  watersheds  throughout  the  Southwest, 
riie  relationship  between  point  and  aieal  Irequency 
on  Hnitc  sized  watersln-ds  for  different  climates  and 
topographies  is  essential  in  regionalizing  such  a 
m.)del  (25). 

I  he  proi  edure  used  here  for  generating  svnthetic 
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airmass  thunderstorm  rainfall  data  over  a  finite- 
sized  area  is  summarized  in  figure  4.  A  table  of 
probabilities  derived  from  the  "smoothed"  curve 
in  figure  3  is  used  as  the  Bernoulli  parameter  p, 
that  is.  the  probability  of  a  significant  storm  occur- 
ring anywhere  on  the  watershed  on  a  given  day,  in 
the  sequential  generation  of  a  Bernoulli  variable. 
If  the  Bernoulli  variable  is  equal  to  zero  (a  failure), 
then  there  is  no  significant  storm  on  the  given  day. 


The  date  is  then  indexed  and  the  next  Bernoulli 
variable  simulated.  If  there  is  a  significant  storm, 
then  the  beginning  time  of  the  storm  (0000  to  2400 
in  military  time)  is  generated  as  a  truncated  normally 
distributed  random  variable  with  mean  starting 
time  1700  (5:00  p.m.)  and  a  standard  deviation  of 
3.5  hours  (2). 

The  next  step  is  to  simulate  the  airmass  thunder- 
storm, (described  in  the  next  section)  and  to  print 
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the  necessary  data  as  to  date,  time,  location,  and 
magnitude.  The  model  also  allows  for  multiple 
storms  occurring  on  the  same  day.  If  a  storm  occurs 
between  0500  and  1700.  there  is  a  reduced  pr(»b- 
ability  of  another  storm  occurring  3  hours  or  more 
after  the  beginning  of  the  first  storm.  If  the  second 
storm  also  occurs  before  1700,  the  same  reduced 
probability,  determined  by  trial  and  error  as  one- 


fifth  the  original  rainfall  chance,  is  used  t.>  predict 
a  third  storm,  and  so  on.  That  is,  the  occurrence  of 
subsequent  storms  is  also  modeled  as  a  BernoulH 
variable. 

Logically,  the  model  should  allow  for  persist- 
ence—the tendency  for  wet  days  to  follow  wet  days 
and  dry  days  to  follow  dry  days.  However,  persist- 
ence was  not  included  in  this  simplified  model. 
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Simulation  of  Airmass  Thunderstorms 

Once  the  date  and  starting  hour  have  been  deter- 
mined, the  synthetic  storm  itself  is  generated 
through  a  group  of  equations  implemented  by  a 
program  referred  to  as  CELTH-4  (fig.  5).  CELTH-4 
consists  of  a  unit  cell  model  coupled  with  a  tech- 
nique of  randomly  grouping  these  cells  as  a  means 
of  describing  thunderstorm  rainfall  shapes  and 
distributions.  The  model  development  was  a  com- 
bination of  simplification,  abstraction,  physical 
arguments,  and  trial  and  error,  with  the  output 
as  the  final  test  of  the  combination  of  distributions. 

The  unit  cell,  building  block  of  this  rainfall  model, 
was  initially  chosen  to  be  circular,  with  the  rainfall 
(D)  at  any  point  within  the  cell  dependent  only  on 
distance  (r)  from  the  center.  Individual  cells  appear 
more  elliptical  than  circular  with  a  long-short  axis 
ratio  of  H:l  (6,  75),  and  such  a  further  refinement 
might  be  justified  depending  upon  the  stated  use 
of  the  model.  Analysis  of  rainfall  data  collected 
from  the  Walnut  Gulch  Experimental  Watershed 
(17)  indicates  that  the  approximate  relationship 
between  the  distance  r  and  corresponding  rainfall 
D  (in  inches)  is: 

Z)  =  0.9Z)o  [1  -  Kin  (V^  r)]  (1) 

for 

r  >  I/Vtt  miles  and 
D  =  Do  [1  -        r/10]  (2) 


for 

r  ^  I/V77  miles 

where  Do  is  the  center  depth  (in  inches)  of  the  unit 
cell,  A.'=l/ln  (Ytt^?);  R  is  the  cell  radius;  and  In 
is  the  loge. 

Total  storm  radius  R  and  the  center  depth  Do  are 
considered  constant  within  each  cell.  To  insure 
flexibility  in  both  shape  and  rainfall  distributions 
between  unit  cells.  Do  is  randomly  generated  for 
each  cell.  Rainfall  records  on  Walnut  Gulch  suggest 
that  individual  cell  center  depths  can  be  approxi- 
mated by  a  negative  exponential  distribution  gen- 
erated by  the  equation: 

Do^Do\n(l-U}  (3) 

where  Do  is  the  mean  cell  center  depth  obtained 
from  rainfall  data  and  U  is  a  uniform  random  vari- 
able, 0<  U^l. 

In  this  distribution  and  in  subsequent  distributions, 
U  is  approximated  by  pseudo-random  numbers 
from  a  random  numbers  generator.  By  keeping  R 
constant  and  varying  Do  in  this  manner,  a  variety 
of  rainfall  configurations  are  obtained,  and  the 
rainfall  at  any  point  within  the  cell,  is  determined  in 
terms  of  the  generated  parameter,  Do,  and  the  vari- 
able, r. 
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The  choice  of  the  exponential  distribution  for  in- 
dividual cell  center  depths  also  arises  from  assum- 
ing a  multicellular  model  with  total  storm  rainfall 
modeled  as  a  gamma  variable.  Specifically,  the  sum- 
mation of  exponential  variables  produces  a 
random  variable  with  a  gamma  distribution  since  the 
gamma  densities  are  closed  under  convolution  (4). 

The  next  step  was  to  describe  shapes  and  rainfall 
distributions  of  entire  storms  by  grouping  these 
cells. 

One  might  assume  that  cell  occurrence  is  uni- 
formly random  across  the  rain  gage  network. 
However,  although  the  center  of  each  storm  has  an 
equal  chance  of  occurring  at  a  given  point  on  the 
grid,  the  clustering  of  cells  and  analysis  of  rainfall 
with  respect  to  time  on  recorded  rainfall  isohyets 
suggests  that  only  the  location  of  the  first  cell  is  truly 
random.  The  remaining  cells  of  the  storm  tend  to 
group  around  the  first  cell  and  at  the  same  time  pre- 
serve a  direction  of  storm  movement.  These  ob- 
servations motivated  the  introduction  of  two  basic 


storm  parameters:  the  average  number  of  cells  per 
storm,  A^,  and  the  preferred  direction  of  cell  place- 
ment, ^0. 

A',  determined  from  rainfall  data,  is  used  to  govern 
the  number  of  cells  generated  per  storm  (  V).  Since 
iV  is  a  discrete  random  variable,  and  the  occurrence 
rate  of  cells  within  the  duration  of  rainfall  is  assumed 
constant,  it  is  assumed  to  have  a  Poisson  distribu- 
tion limited  at  the  lower  end  by  three  cells  as  sug- 
gested by  Petterssen  (19).  The  average  number  of 
cells  was  determined  roughly  from  Walnut  Gulch 
data,  and  the  chance  of  having  more  than  seven 
cells  in  one  storm  was  very  small. 

00,  on  the  other  hand,  is  used  to  locate  the  direc- 
tion of  the  next  cell  generated.  It  is  the  direction 
of  the  second  cell  from  the  first  and  is  altered  by  an 
amount  A0  for  each  additional  cell  so  that  0,  is  the 
direction  of  movement,  in  degrees,  after  the  z'th 
cell,  or: 

0,  =  e,_,  +  A6li_,  (4) 
where  i  goes  from  1  to  /V  and  A^o  =  0°. 


Figure  3.- 


Empirically  derived  curve  for  the  probability  of  si^ititicant  storms  on  ^  abiiit  tUilrh  watershed. 
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Although  ^0  has  an  equal  chance  of  being  in  any 
direction  (and  so,  is  a  uniform  random  variable), 
Ado  has  a  directional  component,  and  was  initially 
abitrarily  assigned  a  normal  distribution  about  a 
mean  of  0°  with  a  standard  deviation  of  about  60°. 
Although  arbitrary,  this  did  tend  to  sustain  the  direc- 
tion of  storm  movement  in  a  manner  similar  to  that 
observed  in  real  events. 


The  next  step  involves  determining  the  distance 
between  cell  centers.  This  is  the  third  storm 
parameter  d,  governed  by  its  corresponding  mean 
d  as  calculated  from  rainfall  information.  However, 
unlike  A^,  6,  and  its  distribution  is  more  difficult 
to  determine  from  available  storm  data.  The 
distribution  is  approximated  by  two  lines  and 
generated  by: 


^    START  ^ 


\      READ  T. 
\  PROBA 
\  P(l 

\BLE  OF  / 
BILITIES  / 

^)  / 

SET  DAY  INDEX 
I  =  1 

■  SIMULATE  BERNOULLI 
VARIABLE 
B  P(I) 

NO 


GENERATE  BEGIN  TIME 
OF  STORM 


SIMULATE  AIR-MASS 
THUNDERSTORM 


PRINT  WATERSHED  ID> 
DATE,  TIME,  AND 
STORM  DATA 


PERSISTENCE 

9    9  9 


YES 


Figure  4.  —  Flow  chart  for  generation  of  seasonal  synthetic  rainfall  data. 
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^    START  ^ 


READ  STORM 

AND  CELL 
INPUT  PARA-. 
METERS 


GENERATE  THE  STORM 
PARAMETERS:  NUMBER 
OF  CELLS  IN  STORM  (N) 
AND  THE  PREFERRED 
DIRECTION  OF  STORM 


GENERATE  COORDINATES  OF 
CELL  CENTERS  FROM:  LOCATION 
OF  FIRST  CELL.  GENERATED 
CHANGE  (A-O)  INO.  AND 
GENERATED  CENTER  SEPARATION  (d), 


± 


GENERATE  CENTER  DEPTHS 
(D^)  FOR  EACH  CELL 


CALCULATE  TOTAL  DEPTH  OF 
RAINFALL  AT  EACH  GAGE 
DUE  TO  N  CELLS  AND 
STORE  IN  ARRAY  T. 


NO 


IF 


LAST  STOR>I 
FOR  THIS 
SET 


YES 


Figure  5  — Flow  chart  for  simulation  of  individual  airmass  thunderstorm  rainfall. 
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d  =  VTOZZ       for  U  ^  0.4  (5) 

d=0.b  (10-  V60-60f/)       fort/ ^0.4  (6) 

where  U  is,  a.  uniform  random  variable,  0  <  [/  ^  1, 
such  that  the  total  storm  rainfall  covers  an  average 
of  about  40  square  miles  and  a  maximum  of  about 
90  square  miles  with  the  long  axis  never  greater 
than  about  18  miles  (1 7). 

The  final  step  in  the  synthesis  involves  calculat- 
ing the  location  of  the  first  storm  rain  cell  on  the 
Walnut  Gulch  network.  This  is  done  by  using  a 
uniform  random  variable  to  generate  a  rain  gage 
number.  For  example,  on  a  100-rain-gage  grid,  the 
equation  used  is: 

/=  imu  +  1  (7) 

where  /  is  the  rain  gage  number  and  U  is  a  uniform 
random  variable,  0  <  C/  <  1. 

For  Walnut  Gulch,  the  first  cell  of  each  storm  has 
an  equal  chance  of  occurrence  at  any  rain  gage  on 
or  immediately  adjacent  to  the  watershed.  This 
step  completes  the  processes  involved  in  storm 
generation  from  cell  definition  to  method  of  place- 
ment. Any  number  of  storms  can  be  generated 
from  the  cell  and  storm  parameters  in  CELTH-4. 

Results 

Ten  years  of  thunderstorm  rainfall  were  generated 
for  the  Walnut  Gulch  watershed.  The  first  4  years  of 
synthetic  record  are  shown  in  figure  6,  along  with  1 
year  of  actual  data,  1963,  which  was  chosen 
randomly  from  the  1960  through  1969  records.  The 
horizontal  scale  in  figure  6  represents  the  summer 


rainy  season  from  June  15  through  October  15.  The 
vertical  scales  represent  the  maximum  total  depth 
of  point  rainfall  for  each  storm.  The  average  annual 
number  of  storms  exceeding  given  depths  and 
ranges  in  annual  values  for  the  1960-69  data  and 
the  synthetic  data  are  compared  in  table  1.  The 
location  within  the  season,  the  length  of  season,  the 
number  of  events  per  season,  and  the  maximum 
storm  depths  for  the  actual  and  synthetic  data 
appeared  to  correspond  fairly  well.  However,  com- 
parison of  the  synthetic  and  real  data  suggest  that 
persistence  is  a  significant  factor  in  thunderstorm 
rainfall  and  should  be  added  to  the  model.  Also, 
the  maximum  recorded  depth  in  10  years  of  record 
(1960-69)  was  3.45  inches  compared  to  2.57  inches 
in  the  10  years  of  synthetic  data.  For  the  real  data, 
the  range  of  annual  maximum  rainfall  depths  was 
1.63  inches  to  3.45  inches  with  a  mean  of  2.48  inches. 
For  the  synthetic  data,  the  range  was  1.70  inches 
to  2.57  inches  with  a  mean  of  2.13  inches.  Further 
analysis  is  necessary  to  determine  whether  the  3.45- 
inch  storm  has  a  recurrence  interval  greater  than 
10  years,  whether  the  model  underestimates  maxi- 
mum depths,  or  both. 

Frequency  plots  of  storm  beginning  times  do  not 
contradict  the  normality  assumption  for  beginning 
time,  and  the  mean  and  standard  deviation  values 
of  1700  and  3.5  hours  appear  reasonable.  Other 
studies  of  precipitation  (11)  and  of  runoff  (2)  also 
point  to  a  preponderance  of  late  afternoon  and  early 
evening  storms  in  southeastern  Arizona. 

As  an  example,  eight  synthetic  events  from  year 
5  (fig.  7  A-H)  were  chosen  to  compare  with  the  real 
events  in  figure  2  A-H.  Comparison  of  eight  iso- 
hyetal  maps,  as  well  as  the  full  10  years  of  synthetic 
data  with  real  rainfall  maps,  suggests  that  while 
the  synthetic  storms  compare  to  some  real  events. 


Table  \.— Comparison  of  maximum  storm  depths  betiveen  10  years  of  Walnut  Gulch  data  (1960-69)  and  10 

years  of  synthetic  data 


Number  of  events  annuaUy  equal  to  or  exceeding  given  depths  of— 


0.6  inch 

1.0  inch 

1.4  inches 

11 

1  inches 

Item 

Max. 

Min. 

Ave. 

Max. 

Min. 

Ave. 

Max. 

Min. 

Ave. 

Max. 

Min. 

Ave. 

Actual  data  (1960-69)  

25 

10 

19 

13 

6 

10 

8 

2 

5 

6 

0 

3 

Synthetic  data  (10  years)  

27 

13 

20 

14 

6 

10 

8 

3 

5 

4 

0 

2 
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the  actual  thunderstorm  rainfall  is  far  more  complex 
than  this  simplified  model.  Further  sophistication 
such  as  elliptical  cells  might  improve  the  model, 
but  within  the  limited  ability  to  test  the  accuracy 
of  the  model  such  a  refinement  might  not  be 
justified  at  present. 


Discussion 

The  synthetic  data  produced  were  compared 
both  on  seasonal  characteristics  and  on  individual 
storm  center  depths  and  isohyetal  map  character- 
istics with  data  from  the  dense  rain  gage  network 
on  Walnut  Gulch.  These  comparisons  indicated 
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Figure  6. —  Seasonal  distribution  of  significant  airmass  thunderstorm  events;  1  year  of  actual  data.  4  years  of  synthetic  data. 


224 


MISCELLANEOUS  PUBLICATION  NO.  1275,  U.S.  DEPARTMENT  OF  AGRICULTURE 


that  such  a  model  may  generate  simplified  events 
which  have  some  uses,  such  as  runoff  prediction, 
but  it  is  a  rather  crude  approximation  and  does  not 
represent  all  of  the  observed  variability  in  thunder- 
storm rainfall. 

The  generated  data  correspond  roughly  to  in- 
dividual thunderstorms  occurring  over  a  finite- 
sized  area.  The  storms  could  be  superimposed  over 
a  smaller  area  to  completely  cover  it,  or  on  to  larger 
areas  (greater  than  100  square  miles)  in  groups  to 
simulate  the  areal  distribution  of  several  multi- 


cellular storms  over  an  entire  region.  Potential 
uses  of  the  synthetic  data  would  determine  their 
application.  However,  additional  research  is 
needed  to  determine  if  better,  simpler,  or  more 
complex  models  can  be  developed  from  available 
information. 

LeCam  (8),  in  referring  to  his  complex  precipita- 
tion model,  stated,  "The  main  difficulty  in  such 
circumstances  is  that,  in  a  model  of  this  com- 
plexity, it  becomes  more  and  more  difficult  to } 
estimate  or  test  anything  through  purely  statistical 


JULY  3  (1710) 
5  CELLS 

CENTER  OF  FIRST  CELL 
WATERSHED  BOUNDARYi 
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SCALE  IN  MILES 


Figure  7.-Isohyetal  maps  of  selected  synthetic  thunderstorm  events:  A,  July  3  (1710)  5  ceUs  center  of  first  cell  watershed  boundary. 
B,  July  4  (1723)  3  cells  center  of  first  cell  watershed  boundary.  C,  July  9  (1733)  4  cells  center  of  first  cell  watershed  boundary.  D, 
Aug.  15  (1300)  5  cells  center  of  first  cell  watershed  boundary.  E,  Aug.  16  (1957)  5  cells  center  of  first  cell  watershed  boundary.  F, 
Aug.  29  (1641)  5  cells  center  of  first  cell  watershed  boundary.  G,  Sept.  11  (1641)  5  cells  center  of  first  cell  watershed  boundary.  H, 
Sept  13  (1835)  5  cells  center  of  first  cell  watershed  boundary. 
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methods."  He  continued  that,  "it  is  then  necessary 
to  specify  some  of  the  elements  through  purely 
physical  arguments."  These  same  observations 
became  apparent  to  the  authors  during  the  de- 
velopment of  the  Walnut  Gulch  stochastic  thunder- 
storm rainfall  model. 

The  question  of  accuracy  must  also  be  answered 
before  such  models  can  be  put  to  practical  use. 
Although  the  model  is  based  on  measurement  of 
thunderstorm  rainfall  from  a  dense  network  of 
recording  gages,  there  are  still  considerable  areas 
for  error.  The  statistical  distributions  have  been 
chosen  largely  for  their  simpHcity  since  more  in- 
volved distributions  may  not  be  justified  from  known 
information.^To  date,  the  overall  test  of  the  model 
has  been  somewhat  subjective;  that  is,  within  afore- 
mentioned limits  it  looks  good  visually.  More  objec- 
tive methods,  if  possible,  would  be  valuable.  These 
could  include  investigations  of  persistence  of  wet 
and  dry  periods  throughout  the  region,  and  possibly 
direct  comparison  of  volumes  of  rainfall  above 


specified  depths  between  the  real  and  synthetic 
data.  Furthermore,  work  is  in  progress  to  faciUtate 
using  point  frequencies  from  long-term  precipitation 
data  to  estimate  storms  on  a  finite-sized  area.  Thus, 
frequencies  of  storm  occurrence  could  be  predicted 
from  point  frequencies  which  are  more  widely 
known. 

Preliminary  evaluations  reported  here  suggest 
that  the  model  may  generate  usable  synthetic  air- 
mass  thunderstorm  rainfall  data,  depending  on  what 
is  wanted  from  the  model.  However,  extensive  evalu- 
ation procedures,  such  as  those  developed  by  Lane 
and  Renard  (D,  need  to  be  implemented.  Such 
procedures,  allowing  for  large  sample  tests  of  the 
mt)del,  would  allow  for  a  more  comprehensive  evalu- 
ation and  are  being  investigated. 

For  general  use,  tlie  variables  in  this  model, 
revised  models,  or  other  similar  thunderstorm  rain- 
fall models,  must  hv  tied  to  meteorological  and 
topographical  differences  locally  and  between  re- 
gions. For  example,  in  the  Southwest,  east  of  the 
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Continental  Divide,  more  intense,  longer  lasting 
thunderstorms  have  been  recorded  than  in  south- 
eastern Arizona.  These  storms  have  added  com- 
ponents for  frontal  activity  and  added  moisture  aloft. 
Variables  representing  frontal  "strength"  and 
available  moisture  could  be  added  to  the  model  for 
airmass  thunderstorms.  The  chance  of  a  front  mov- 
ing across  a  specific  watershed  when  airmass 
thunderstorms  are  expected  to  develop  can  be 
assigned  a  seasonal  probability,  just  as  pure  airmass 
thunderstorms  are  assigned  probabilities  within  a 
season.  Available  moisture  would  increase  or  de- 
crease the  magnitude  of  the  event.  There  is  a  cer- 
tain chance  that  pure  airmass  thunderstorms  will 
occur,  with  the  magnitude  conditional  on  available 
moisture,  along  with  a  chance  that  a  front  also  will 
add  to  the  magnitude  of  rainfall  for  specific  events. 
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Appendix 

hi&i  of  Variables  and  Parameters  Used  To 
Describe  Multicellular  Airmass  Thunder- 
storms 

K      Constant   used  in  rainfall  relationship  and 

dependent  only  on  cell  radius. 
d      Distance    between    cell    centers  generated 

using  a  "triangular"  distribution  and  used  in 

grouping  cells. 
D      Rainfall  in  inches  with  each  cell;  a  function  of 

center  depth  Do  and  distance  from  the  center  r. 
Dn     Cell  center  depth  generated  using  a  negative 

exponential  distribution  with  a  mean  of  Do. 
Do     Mean    cell    center    depth    estimated  from 

Walnut  Gulch  rainfall  data. 


/  Rain  gage  number  where  the  first  cell  is 
located;  generated  from  a  uniform  distribu- 
tion where  /  is  an  integer. 

N  Number  of  cells  per  storm  generated  from  a 
Poisson  distribution  using  N  as  the  mean. 

N  Mean  number  of  cells  per  storm  estimated 
from  Walnut  Gulch  rainfall  data. 

r       Distance  from  the  cell  center. 

R  Radius  of  unit  cell  estimated  from  isohyetal 
maps  of  Walnut  Gulch  data. 

^1)  Direction  of  the  second  cell  as  measured  in 
degrees  from  the  first  cell  where  east  is  defined 
to  be  0°;  generated  from  a  uniform  distribution. 

di  Direction  of  cell  number  /  +  1  as  measured  in 
degrees  from  the  /th  cell  where  east  is  defined 
to  be  0°  and  calculated  from  the  equation 
0i  =  6i-i  +  A6i-i. 
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A6i  Change  in  the  direction  of  cell  placement 
generated  from  a  normal  distribution  with  a 
mean  of  0°  and  a  standard  deviation  of  60°. 

U  Uniform  random  variable  approximated  by 
pseudo-random  numbers  from  a  random 
number  generator  {0  <  U  ^  1) . 

A  Note  on  the  Variability  of  the  Number  of 
Storms  in  a  Season  Where  the  Occurrence 
of  Storms  is  Modeled  as  a  Bernoulli  Variable 

In  the  absence  of  persistence  in  daily  rainfall,  the 
occurrence  of  a  storm  can  be  modeled  as  a  Bernoulli 
variable  where: 

^  _  1  1  if  there  is  a  storm  on  day  k 
[  0  otherwise 

and 

P[Xk=l]=Pk.  (2) 


With  the  above  definitions, 

s„=j:x,  (3) 

k=  1 

wiU  be  the  number  of  rainy  days  in  a  period  of  length 
n  days. 

Of  interest  in  this  discussion  is  the  expected 
value  of  5,1  and  the  variability  of  S„.  Mathematical 
expectation  leads  to 

EiS„)^np,  (4) 

and 

n 

Var  iSn)  =  np-  ^  p^, 

k=l 

where  Var  denotes  variance,  and  p  is  the  "average" 
probability  of  rain,  such  that 
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p=  Uln)  2  p,.  (6) 

k-l 

For  example,  see  Feller  (4). 

The  assertion  is  that  the  variance  of  Sn  is  maxi- 
mum in  the  absence  of  strong  seasonality.  That  is, 
for  the  same  p,  the  variance  of  S„  is  maximum  when 
the  Pa  values  do  not  vary  in  the  season.  The  proof 
of  this  assertion  is  complete  if  we  can  show  that  in 
equation  5,  the  second  term  is  minimum  when 
Pk  —  p  for  all  k. 

Let  Tn,  be  the  sum  when  all  Pk  —  P,  then 

/( 

=  ^  p'i^  "p'-  (7) 

k=l 


Let  be  the  sum  when  all  but  two  of  the  Pk  =  P, 
and  the  other  items  are  p j  —  p  +  e  and  Pj^i  =  p  —  e 
for  some  j ,  then: 

Tn,  =  {p  +  e)-  +  (p  -  €)-^  +  Pi  (8) 
and  thus, 

Tn,  -       +  2e^  +  ( n  -  2  )p'  (8a) 

=  np-  +  2e-.  (8b) 

Clearly,  Tn,  >  Tn,  for  all  e  >  0,  and  the  proof  for 
more  than  two  of  the  Pk  —  P  follows  by  induction. 
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STOCHASTIC  MODEL  OF  DAILY  RAINFALL  i 


By  P.  Todorovic  and  D.  Woolhiser  ^ 


Abstract 

An  application  of  stochastic  processes  for 
description  and  analysis  of  daily  values  of  precip- 
itation is  presented.  The  total  amount  of  precipita- 
tion S{n)  during  an  n-day  period  is  a  discrete 
parameter  stochastic  process  such  that  O^S(n) 
^  S(/i4- 1).  The  most  general  form  of  the  distribu- 
tion function,  mathematical  expectation,  and 
variance  of  S{n)  are  determined.  The  following 
special  cases  for  the  sequence  of  daily  rainfall 
occurrences  are  considered:  (1)  Sequence  of  inde- 
pendent identically  distributed  random  variables, 
(2)  sequence  of  independent  random  variables, 
and  (3)  Markov  chain.  In  addition,  assuming  that 
certain  regularity  conditions  hold,  it  has  been 
proved  that  S{n)  is  asymptotically  normal.  The 
first  passage  time  of  S{n)  and  the  corresponding 
distribution  function  are  also  considered.  A  numer- 
ical example  for  cases  (1)  and  (3)  is  presented 
assuming  that  the  daily  rainfall  amounts  are  expo- 
nentially distributed. 


temperature,  winds,  and  origin  of  airmasses,  are 
not  taken  into  account.  Therefore,  the  model  can 
not  provide  physical  explanations  of  features  of 
rainfall  phenomena. 

Consider  a  certain  period  of  time  which,  for 
example,  consists  of  n  days.  To  each  day  of  the  n- 
day  period  is  associated  a  random  variable  rjj 
which  assumes  only  two  values,  0  and  1,  defined  as 
follows: 

_  ri  if  yth  day  is  wet 
lo  if  yth  day  is  dry 

where  7=1,2,.  .  n.  According  to  this  definition, 
the  number  of  rainy  days  in  this  period  is  obvi- 
ously equal  to  the  following  sum: 

^'.=  2^j  (1.1) 

(A^„  =  0,  1,  .  .  .,  n). 


1.  Introduction 

In  this  paper,  an  attempt  is  made  to  develop  a 
stochastic  model  for  description  and  analysis  of 
certain  aspects  of  the  rainfall  phenomenon  utilizing 
daily  precipitation  records.  The  primary  reason 
for  constructing  such  a  model  is  the  fact  that  daily 
rainfall  data  are  the  most  readily  available  and 
are  sufficient  for  many  hydrological  problems. 

In  this  report,  we  are  concerned  with  merely  a 
probabilistic  treatment  of  the  observed  record. 
Various  climatological  and  other  factors,  such  as 


'  Contribution  from  the  Colorado  State  University  Experiment 
Station  and  the  Agricultural  Research  Service,  USDA. 

2  Associate  professor  civil  engineering,  Colorado  State  Uni- 
versity, and  research  hydraulic  engineer,  USDA,  Fort  Collins, 
Colo. 


Let  ^j,,  =  1,  2,  .  .  n  denote  the  daily  value  of 
precipitation  of  i^th  rainy  day,  then  the  total  amount 
of  precipitation  S{n)  of  this  n-day  period  is  given  by 

S(7i)=2f.       n=l,2,  ...  (1.2) 

f  =  0 

where  by  definition  S{n)  =  0  if  A'^,,  =  0.  Since  >  0 
for  all  v=  1,2,.  .  .,  n  it  follows  that 

0  =  S(0)  ^  S(l)  ^  S(2)  ^.  (1.3) 

Provided  is  a  sequence  of  random  variables 

for  which  the  central  limit  theorem  holds,  then  if 
certain  regularity  conditions  are  satisfied  S{n)  is 
asymptotically  normal. 

Finally,  in  connection  with  random  variables 
yV„  and  S(n),  we  will  consider  the  following  two 
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random  variables  defined  as 

C.=  in{  {n;  N„=  i^}       v=l,2,...  (1.4) 

X(u)  =  inf  {n;  S{n)  >  u}       u  ^  0.  (1.5) 

Note  that  and  x(")  ^^e  discrete  and  con- 
tinuous parameter  processes,  respectively,  such 
that 

«  ^2  ^  .  .  .  and  x(")  ^  + 
for  u  3=  0  and  Au  >  0. 


2.  Mathematical  Expectation  and 
Distribution  Function  of  the 
Process  S{n) 

Denote  by  Ia{x)  the  indicator  function  of  the  set 
A,  that  is, 

1  ifxeA 


0  if  jcM 

and  consider  the  random  variable  I{.\„=k),  where  A^,, 
is  defined  by  equation  1.1.  Since  {A'„  =  0},  {A^„  =  2}, 
.  .  {jV„  =  n}  represents  a  countable  partition  of 
the  sample  space,  it  follov^fs  that: 

X    /(:V„=.)=1,/(.V„  =  .V/(.V„=J>=0  (2.1) 

for  all  i^j^O,l  n. 

On  the  basis  of  equation  1.2  and  relations  2.1, 
the  sum  S(n)  may  be  written  as 

\f={)       I    \k  =  0  / 

H     r  fc       1  n 


and 


£{f,}  =  a>0 


{1.1) 


where  D{^\  indicates  variance  of  the  random 
variable  ^. 

The  basic  assumption  underlying  further  develop- 
ment is  the  hypothesis  that  for  all  A:  =  1,  2,  .  .  .  n\ 


Xk       and  I{\=k) 


(2.3) 


are  mutually  independent  or  that  the  events 
{A^A- =e  ;c}  and  {;V„  =  A}  are  independent  for  k  = 
1,  2,  .  .  ..  n.  To  clarify  the  physical  meaning  of 
this  supposition,  write  the  second  event  as  follows: 


{yv„=^}  =  {  2^"=^} 


fc=0  l'=0 


U  {t?,  =1  

1  «  I,  <  .  .  .  <  U «  n         '  ^ 

T,,-^^,  =  0,  .  .  .,i7.„  =  0}.  (2.4) 

Therefore,  the  independence  {Xk  ^  x}  of  event  2.4 
means  that  the  first  event  is  independent  of  each 
member  of  union  2.4.  Physically  speaking,  it  means 
that  information  concerning  which  k  days  in  the 
n-day  period  were  wet  and  which  (n  —  k)  were  dr> 
does  not  contribute  anything  to  our  knowledge 
concerning  the  corresponding  amount  of  precipi- 
tation Xk. 

In  the  sequel,  the  mathematical  expectation,  the 
variance,  and  the  distribution  function  of  the 
process  S{n)  will  be  determined  provided  2.2  and 
the  independence  hypothesis  2.3  hold.  Under  these 
conditions,  we  have  (for  more  details  see  Todorovic. 
19703) 

E{S{n)}^  J  £{.Y./,.v,.  =  .)}=  2  E{Xk}P{.\n  =  k} 

k=0  k=i 


where  A'n  =  0  and  Xk=  ^       In  the  following  we 

shall  confine  our  attention  to  the  case  where  ^i, 
^•2,  ■  •  ..  $»are  independent  identically  distributed 
random  variables  with  finite  mean  and  variance, 


'  ToHonnic.  P.  ON  SOME  PROBLEMS  INVOLVING  A  RANtK^M 
Nl'MBER  OF  RANDOM  VARIABLES.  Ann.  Math.  Statis.  41:1059- 
1063.  1970. 
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=  aj^  kP{Nn  =  k}  =  cxE{Nu}.  (2.5) 
fe=  1 

Since, 
and 

E{X,}^  =  kE{^^}^  +  k{k-l)a^ 
it  follows  that 

£{S(n)}2=  J;  [kE{^^}^  +  k{k-l)a^}]P{N„=k} 

A-  =  l 

=  £{fi}2£{yV„}  +  a^E{Nn}^  -  a^E{N„}. 
From  this  and  equation  2.5  we  have 

D{S{n)}^  pE{N„}  +  aW{Nn}.  (2.6) 
Remark  2.1. 

Equations  2.5  and  2.6  can  be  obtained  without 
using  2.3  by  assuming  that  the  events  {Nn  ^  k  —  1} 
and  {^k  =s  x}  are  independent  for  k  =  I,  2,  .  .  . ,  n. 
To  show  this,  write  S{n)  in  the  following  form: 

n  n  n 

1'=  1       k  =  v  v=\ 
n 


Thus, 

£{S(^)}=f£{^}£[l -/,,., 
1^=1 

which  proves  the  assertion. 

To  determine  the  distribution  function  of  5(n), 

Fn{x)^P{S(n)^x}  (2.7) 
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we  shall  proceed  as  follows:  keeping  in  mind  that 
{Nn—v},  v=0,  1,  .  .  ..  n,  represents  a  countable 
partition  of  the  sample  space,  it  follows  that: 


=  P{N„  =  0}  +  J;  P{Z.  «  x}P{N„  =  v} 

11=  1 

(2.8) 

where  assumption  2.3  is  required  in  the  last  step. 

To  determine  this  distribution  effectively,  it  is 
necessary  to  compute  the  probabilities  P{Xv  ^  x] 
and  P{Nn=v),  2,  .  .  .,  n.  The  first  proba- 

bility will  be  determined  under  the  assumption  that 
H{x)  is  exponential.  Computation  of  P{Nn  =  v}  will 
be  done  in  section  4  under  various  assumptions 
concerning  the  random  variables 

7?1,  T72,  .  .  .,  r)n  (2.9) 

3.  Computation   of  the  Probability 

P{X.  ^  x) 

In  this  section,  an  elementary  way  of  computing 
the  distribution  function  P{X^  =S  x}  is  presented 
based  on  the  assumption  that 

//(x)  =  l-e-^^  (3.1) 

We  want  to  show  that  in  this  case 

P{X,.  «     =  \    u^-'e-^"du.  (3.2) 

1  {v)  Jo 

This  result  is  very  well  known  and  can  be  obtained 
easily  by  using  the  technique  of  characteristic 
functions 

E{e''^''}^E  \e    '     =  (£{e''^.})^ 
Further,  using  the  calculus  of  residues  we  have 
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Jo  k  —  it 

so  that 

£:{e"^^}=  ('l--V' 
\  k/ 


Write 
For  v  =  2. 

For  p=3 

Continuing  in  this  way,  we  find  that 
/.-i-x-Vd^-l)! 
Therefore,  equation  3.3  becomes 

P{X,  ^  x}  =  P{X,-,  ^  x}-e-'^{\x)''-'l{i'-\)l 

(3.4) 

Keeping  in  mind  that  A'()  =  0  and  v  ^  0  and  using 
equation  3.4  recurrently,  we  find  that 

P{Xi^x}  =  l-e-^^ 

P{X.,^x}  =  P{Xi^x}-e-^^kx) 


P{X,  <  x}  =  /^{A'._,  «  x}  -  e-^^  lX^)'-'/!^  -  1 ) ! 


and  this  is  the  characteristic  function  of  the  Gamma 
distribution,  which  proves  equation  3.2. 

The  following  is  a  more  direct  way  which  does  not 
require  the  residues  theory.  Consider, 


(3.3) 


Summing  the  left  and  the  right  side  of  this  equation, 
we  obtain 

PiX.^xj^l-e-^^"^^  {KxYlil 
1=1 


where  r{v)  represents  the  Gamma  function. 


4.  On  Some  Parti<Milar  Cases 

In  the  followinii,  we  shall  use  relations  2.8  and  3.2 
to  compute  the  mathematical  expectation,  variance, 
and  tile  distribution  function  of  the  process  S(«) 
for  three  particular  cases: 

A:  Tj,.  7}.  T}„  represents  a  sequence  of 

independent  random  variables  with: 


P{X,.^x}  =  K''  (  •     ■   f  exp{-X(x,+  .  .  .  +x,)}dx,  .  .  .  dx, 

=  \i'  \     \\  .  .   \  exp{- A (a;i+  .  .  .  +x,)}dxy  .  .  .  dx, 

J»     Jo  Jo 

nx-xi  r x-x,  - ...  -X 

...  J  Jx,  .  .  .  dx,-,. 


0  JO 


/.-.= 

we  have 


nx-Xi  rx-Xi  -  .  .  .  -X 

...  dxi  .  .  .  dx„-\. 

I  Jo 


Jo 


nx-x,  rx  x'l 

dxxdx2=  \  {x  —  Xx)dxi  =  —- 
)  Jo  ^- 
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P{V.'-1}=P        0<p<l  (4.1) 


1  + 


j  i  e-^"anip,  u)du\ 
Jo  u.  J 


for  all  1^=  1,  2,  .  .  .,  n. 

B:  171,  172,  ...,17,1  are  independent  but  not 
identically  distributed,  that  is: 


where 


(4.2) 


C:  Tji,  172,  .  .  .,  Tjn  is  a  Markov  chain  such  that 
for  all  t'=  1,  2,  .  .  .,  n: 


-niP,u)-^^[—)  r(.) 

In  figure  1,  graphical  presentation  of  F„{x)  is  given. 


^{T?^  =  ih.-i  =  o}=go 

f{T7^=  1  \r}^-i^  l}  =  9i 

where  90  andgi  are  independent  of  1^. 
Case  A 

By  virtue  of  the  assumption,  we  have 


(4.3) 


Case  B 

Here,  we  have 


and 


i'=i 

DNn=  5;  P.(l-P.). 


(4.5) 


E{t7^}  =  p,  E{Nn}  =  np  and  D{N„}  =  np{l—p)  On  the  basis  of  this,  it  follows  that 


Keeping  in  mind  that  the  mean  and  variance  of 
exponential  distribution  3.1  are  equal  to 

a=ll\  and  P  =  llk^ 

respectively,  by  equations  2.5  and  2.6,  it  follows  that 


^{5(n)}=-^andD{S(/i)}=^(2-p). 


E{S{n)}=lfp. 


D{S(n)}^lf^  p.  +  ^^J^  p,{l-p.) 


and  by  virtue  of  relation  2.4 


lsii<  .  .  .  <i„«n 


i^-PiJ- 


(4.6) 


To  determine  the  distribution  function  F„(x) 
given  by  equation  2.8,  we  need  P{Nn  =  v}.  Because 
of  relation  2.4,  it  is  apparent  that 

P{Nn=p}  =  ("^  ( 1  -  P ) (4.3a) 

Hence, 

F„(x)  =  (l-p)« 


'{n„  =  o} 


(4.4) 


Figure  1.  — Distribution  function  F„(x). 
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Hence,  we  have 
II 

.    .    .  (l-p,„) 

r  W-'e-^^diA-  (4.6b) 
I (f)  Jo  J 

It  is  apparent  that  the  case  A  immediately  follows 
from  the  case  B  if  p./ =  p  for  all  I' =  1,  2,  .  .  .,n. 

Case  C 

Here  we  shall  use  the  results  developed  by 
Gabriel  ^  and  applied  by  Gabriel  and  Neuman.^  The 
following  is  a  short  summary  of  the  second  paper. 
The  basic  hypothesis  of  that  paper  is  that  the  prob- 
ability of  rainfall  on  any  day  depends  only  on  whether 
the  previous  day  was  wet  or  dry.  Given  the  event  of 
the  previous  day,  then,  the  probability  of  rainfall 
is  assumed  independent  of  further  preceding  days; 
in  other  words,  that  condition  C  holds.  If  rjo  denotes 
the  random  variable  which  refer  to  the  day  pre- 
ceding the  first  day  of  the  ra-day  period,  which 
assumes  only  two  values.  1  if  the  day  was  wet  and  0 
if  the  day  was  dry,  then  the  conditional  probability 

P{Nn='v\■r)^^^\]=ip^(v,n) 

of  having  v  wet  days  in  the  «-day  period,  provided 
the  preceding  day  was  wet  is  equal  to 


ip^{v,n)  =  9r  (1  -  9o)"-"2] 


n  —  V  —  \ 
b-  1 


(4.7) 


^here. 


*  Gabriel.  K.  R.  the  distribution  of  the  number  of  suc- 
cesses IN  A  SEQUENCE  OF  DEPENDENT  TRIALS.  Biomelrika 
%:959-960.  1959. 

'  (iabriel,  K.  R..  and  Nfuniami.  J.  A.  MARKOV  CHAIN  MODEL 
FOR  DAII  Y  RAINFALL  OCCURRENCE  IN  TF.L  AVIV.  Roy.  Met.  Sof. 

(London)  Quart.  Jour.  88:90-95.  1962. 


2v-  n  + 


\(  V  <  n 
{{  V  =  n. 


In  the  second  case,  that  is,  when  ^  =  «  we  have: 

ip,(n,n)=q1  (4.8) 

The  conditional  probability 

P{N„=v\T)^  =  Q}  =  ip^{v.  n) 

of  having  v  wet  days  in  the  n-day  period,  provided 
the  preceding  day  was  dry,  is 


(.,-..=«a-,.r-|(;:j)(":o 


where. 


2v-n- 


l-giV'  (q^Y 
1  -  qj  \qi 


if  i^  >  0 
if  J^  =  0. 


(4.9) 


In  the  second  case,  that  is,  when  i'  =  0 

<^o(0,  n)  =  (1  -  go)"  (4.10) 

constants  a  and  b  in  equations  4.7  and  4.9  are 
defined: 

a  =  inf      A  ^  1/2  (c-  1)} 


b=  inf  {A-:  k  ^1/2  c). 


(4.11) 


In  other  words,  «  is  the  first  (or  least)  integer  not 
smaller  than  l/2(c  —  1).  Similarly,  b  is  the  least 
integer  not  smaller  than  l/2r. 

Finally,  the  probability  that  i'  wet  days  will  occur 
in  tlu-  n-day  period  is 

P{N„  =  v]^R,f,^{i\n)+ {\-R)^o{i-.  n)  (4.12) 

and  the  corresponding  mathematical  expectation 
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^{A',,}  =  nQ+iR-Q)  ^^Y^  (4.13) 


where, 

/?  =  P{t7o=1}  d=qi-q(, 


and    Q='Y^-  (4.14) 
The  variance  is 

DNn  =  Qin'  -  n)  +  Q{1  +  R  -  2Q) 
[nil  -d)-{l-  d")  +  {1-Q)  (R-Q) 

d 


[d{l-d")  -nd"{l-d)] 


+  nQ+  (R-Q)  il-d") 


n-dr 

d 

1-d 


\^nQ+{R-Q)  {l-d'^)^y.  (4.15) 


It  is  known  (see  Feller,  1966,  p.  321)^  that  A^„  is 
asymptotically  normal  with  mean  and  variance: 


E{N„}  ~  nQ 


DNn  ~  nQil-Q) 


1  +  d 
1-d 


(4.16) 
(4.17) 


The  probabiUty  P{A^„  =  0}  may  be  determined 
from  the  foregoing  equation  or  directly  as  follows: 

P{/V„  =  0}  =  «P{/V„  =  0|t,o=  1} 

+  (1  -«)P{yv„  =  o|Tjo  =  o}. 

Since 

P{Nn^O\vo--l}^P{Vi-0,.  .  .,-rj„  =  0|T?o=l} 

=  P{7,x  =  017,0=1}  (1-go)"-' 

=  {l-q^)il-qo)"-K 


Feller,  William.  AN  INTRODUCTION  to  probability  theory 
AND  ITS  applications,  v.  II.  John  Wiley  &  Sons,  Inc. ,  New  York. 
1966. 


U.S.  DEPARTMENT  OF  AGRICULTURE 
Similarly, 

/'{/V«  =  Oho=0}=  (1-go)" 

Hence, 

P{N„  =  0}  =  {il-qo)-Rd}  {1~  qo)"-K 
From  this  and  equation  4.12  it  follows  that 

P{S(n)  ^j,}  =  {{l-q,)-Rd}  (l-9o)"-' 
+  2  {R<pi{v,n)  +  {1-R)^,  {u,n)}P{X.^x}. 

(4.18) 


5.  The  Central  Limit  Theorem 
For  Sin) 

In  this  section,  an  attempt  is  made  to  investigate 
the  asymptotic  behavior  of  the  distribution  function 
of  S{n)  when  n-^  'x.To  this  end,  consider  the  sum: 


(5.1) 


and  suppose  that  ^i,  ^2,  ...  is  the  sequence  of 
independent  identically  distributed  random  vari- 
ables such  that  condition  2.2  holds.  In  that  case, 
we  have: 

E{X,}  =  vaandD{X,}^pl3.  (5.2) 

Then,  according  to  the  Linderberg-Levy  theorem, 
the  sum  5.1  is  asymptotically  normal  with  mean 
and  variance  given  by  relations  5.2.  In  other  words. 


lim  P  f^V^  ^A  =  -^  r  ^""^^^  (5.3) 

To  determine  the  limiting  distribution  of  S{n), 
result  5.3  and  basic  proposition  5.1  are  needed. 
Consider  the  sequence  of  cumulative  sums 

Cn  =  2  p^{n)ut,  (5.4) 
v=i 

where  Pv{n)  and      satisfy  the  following  conditions: 


PROCEEDINGS  OF  THE  SYMPOSIUM  ON  STATISTICAL  HYDROLOGY 


239 


(a)  Pv{n)  ^0  for  all  i/  =  0,  1,  .  .  .,  n  and  p,.(n) 
—*  0  asn  —*     for  every  finite  i'  =  1,2,  .  .  .  . 

n 

(b)  ^  pr  (n)      1  asn  ^ 

1/=! 

(c)  for  all  1,  2,  .  .  . ,  u^,  3=  0  andlim  =  y  Oc. 
Under  these  three  conditions,  we  have 

lim  c„  =  y.  (5.5) 

n— »«i 

Proof.  —  On  the  basis  of  condition  c ,  for  some  e  >  0 
there  exists  m  =  m(e)  such  that  for  every  v>  m,u^ 
>  y  —  €.  Therefore,  for  n  >  m  we  have 

II  III  II 

i/=l  v=l  i'=m+l 

^  ^  p,{n)u,+  {y  -  e)    ^  p^(n) 

i/=l  i'=m+l 

for  every  /i  >  m.  If  we  let  n  ^  ^,  the  first  term  on 
the  right  side  of  this  inequality  tends  to  zero  (accord- 
ing to  condition  a)  and: 

X 

^    p.(n)  1 

v=m+l 

according  to  b.  Thus, 

hm  inf  c„  ^  (y  —  e).  (5.6) 


Similarly,  for  e  >  0  there  is  mie)  such  that  for  ill 
V  >  m{e),  u„<  y+  e. 

Therefore,  assuming  n  >  m  we  have 

III  II 
c„  =  ^  p..(n)  u,.  +    2    p,{n)  Ut, 

i"  =  1  V  =  m  +  1 

m  n 

<  2  p,.(n)«„+  (y+  e)    ^  p..(n). 

I'  =  1  v  =  w  +  1 

From  this  follows  that 

lim  sup  Cn  «  (y  +  e) 

since 


(y  -  €)  ^  lim  inf  c„  €  lim  sup  c„  ^  (y  +  e) 

for  all  e  >  0,  the  proof  of  the  proposition  follows. 

In  the  sequel,  this  proposition  will  be  used  to 
determine  the  asymptotic  behavior  of  S(n)  for 
large  n.  What  we  actually  want  to  show  is  that 
when  they  are  appropriately  normalized  and 
centrahzed,  the  sequences  {S(n)}'j  and  {A'„}^ 
have  the  same  asymptotic  distribution  as  n^^c. 
Then  by  virtue  of  relation  5.6  it  follows  that  S(n) 
is  asymptotically  normal. 

Proposition  .5.2.  —  Suppose  that  for  all  finite 
1^  =  0,  1.  .  .  . 


Um  P{Nn=i^}  =  0 


(5.7) 


and  that  relation  5.3  and  condition  2.3  hold,  then 


lim  P  i   7= —  ^  X  ^  =  hm  P  \ 


Xi,  —  av 


e  -'  du.  (5.8) 


Proo/.  -  Keeping  in  mind  that  {iV„  =  0},  {.V„  = 
1},  .  .  .,  {A'„  =  n}  represents  a  countable  partition 
of  the  sample  space  and  that  S{n)=X,  on  the  set 
{A',,  =  v],  we  have 


S{n)-(xN„ 


A,  —  av 


=  P{N„  =  Q]+y  p\^'~^"  ^x]  P{Sn-v]. 
.-4,     I   V/3i'  ) 

On  the  basis  of  the  conditions  of  the  propi>sitit>n 
and  proposition  5.1.  we  have 


|=UmP|:^^^.x|. 
I  I    \  ai-  ) 


From  this  and  relation  5.3  the  proof  of  the  assertion 

ft)lK)WS. 

Proposition  5.1  asserts  tlial  if  f,,  ^j.  ...  is  a 
sequence   of   iiulependcnt    idcuticalK  distributed 
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random  variables  with  finite  mean  and  variance, 
then  (if  conditions  2.3  and  5.7  hold)  S(n)  is  asymp- 
totically normal  with  mean  an  and  variance  (in. 
A  similar  theorem  can  be  proven  for  nonidentically 
distributed  but  independent  random  variables.  If 
condition  2.3  does  not  hold,  the  situation  is  not 
so  simple.  This  case  will  not  be  considered  in  this 
paper. 

6.  On  the  Processes  ^j/and  x(u) 

It  is  frequently  of  interest  to  consider  the  follow- 
ing problem:  Given  the  number  of  rainy  days  v, 
what  is  the  minimum  number  of  days  n  such  that 
Nn—v'i  This  is  the  meaning  of  the  random  vari- 
able C,v  defined  by  equation  1.4.  The  distribution 
functions  of  iv  and  Nn  are  related  by  the  apparent 
identity 

P{i.<  n]  =  l-P{Nn^  [v-D]  (6.1) 


where  obviously  n—v,  v+\, 
in  the  case  A: 


k=0 


P'  (1  -P)"- 


For  example. 


(6.2) 


By  virtue  of  the  definition  (see  equation  1.5), 
we  have: 


P{x{u)^n}^l-P{S{n)^u}. 


(6.3) 


The  random  variable  x(")  represents  the  least 
number  of  days,  n,  in  which  the  corresponding 
amount  of  precipitation  exceeds  u.  Using  results  of 
previous  sections,  the  distribution  functions  6.1 
and  6.3  can  be  effectively  determined. 


theoretical  values  were  computed  for  the  case  A 
(binomial)  and  case  C  (Markov  chain)  models. 
The  parameter,  p,  for  the  binomial  model  is  shown 
on  the  figure,  and  the  sample  values  for  the  param- 
eters in  the  Markov  chain  model  are  shown  in 
table  I. 

The  Markov  chain  model  gives  a  better  approxi- 
mation to  the  number  of  rainy  days  than  does  the 
binomial,  which  is  consistent  with  previous  observa- 
tions.^'^ 

Case  B 

Daily  rainfall.  — Relative  frequency  distributions 
of  daily  values  of  precipitation  for  the  first,  second, 
and  third  10-day  periods  are  shown  in  figure  3. 
These  three  distributions  are  very  similar,  and 
their  shapes  suggest  that  an  exponential  distribu- 
tion may  give  a  good  fit.  In  figure  4,  the  observed 
and  corresponding  theoretical  cumulative  dis- 
tributions of  daily  values  of  precipitation  for  the 
first,  second,  and  third  10-day  period  are  given. 
The  exponential  distribution  gives  a  very  good  fit 
for  the  first  10-day  period.  For  the  second  and 
third  10-day  periods,  however,  the  observed 
frequency  of  rainfall  amounts  less  than  0.4  inch 
is  considerably  greater  than  the  theoretical. 

Case  C 

Rainfall  for  n  days.  — The  distribution  functions  for 
n  day  rainfall  were  computed  for  the  binomial- 
exponential  model  (equation  4.4)  and  the  Markov 
chain-exponential  model  (equation  4.18)  and  are 
shown  in  figures  5  and  6,  respectively.  The  computer 


'  See  footnote  3. 

*  Caskey,  James  E..  Jr.  a  Markov  chain  model  for  the  prob- 
ability of  precipitation  occurrence  in  intervals  of  various  length. 
Monthly  Weather  Rev.,  91:  298-301.  June  1963. 


7.  Numerical  Example 

As  a  numerical  example  to  illustrate  the  foregoing 
methods,  daily  rainfall  data  for  Austin,  Tex.,  will 
be  used.  Records  are  available  for  May  for  the 
period  1861-1967,  with  7  missing  years. 

Case  A 

Number  of  rainy  days.— The  observed  and 
theoretical  distributions  for  the  number  of  rainy 
days  in  the  first  10-day,  the  first  20-day,  and  the 
first  30-day  periods  are  presented  in  figure  2.  The 


Table  l.— Sample  values  of  parameters  in  Markov 
chain  counting  model^ 


Period  qa  qi 


1st  10  days                                                0.1837  0.3744 

2d  10  days  1921  .3916 

3d  10  days  1727  .4407 

30  days  1828  .4025 


'  R  =  0.24. 
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program  for  the  Markov  chain  model  is  presented 
in  the  Appendix. 

Both  models  give  an  excellent  fit  for  total  rainfall 
for  the  first  10-days.  The  Markov  chain  model  is 
slightly  superior  to  the  binomial  for  the  20-day  and 
30-day  periods,  although  both  are  distorted  by  the 
less  satisfactory  fit  of  the  exponential  to  daily 
rainfall  values  for  the  second  and  third  10-day 
periods. 


8.  Summary  and  Conclusions 

The  most  general  forms  for  the  distribution 
function,  mathematical  expectation,  variance, 
and  first  passage  time  of  the  total  amount  of  pre- 
cipitation, S{n),  during  an  n-day  period  have  been 
determined.  Assuming  certain  regularity  conditions 
hold,  it  has  also  been  shown  that  the  distribution 
of  5(a))  is  asymptotically  normal.  Three  special 
cases  of  the  general  case  given  by  equation  2.8 


First  10  Days  in  May 
p«  0.232 


o  Binomial 

^  Markov  Chain 


Observed 


First  20  Days  in  May 
p  «  0.241 


First  30  Days  in  May 
P«  0243 


Figure  2.  — Theoretical  and  observed  distributions  for  the  number  of  rainy  days  in  May  (Austin.  Tex..  18t)l-l%7l 


242 


MISCELLANEOUS  PUBLICATION  NO.  1275,  U.S.  DEPARTMENT  OF  AGRICULTURE 
f% 


70 
60 
50 
40 
30 
20 
10 
0 


First  10  Days   in  May 
Second  10    Days  in  May 
Third     10    Days  in  May 


0     0.4    08     1.2     1.6    2.0    2.4    2.8    3.2    36  4.0 

Figure  3.  — Frequency  distribution  of  daily  values  of  precipitation  for  the  first,  second  and  third  10-day  period  in  May. 


were  considered:  the  sequence  of  daily  rainfall 
occurrences  are  (a)  a  sequence  of  independent, 
identically  distributed  random  variables;  (b)  a 
sequence  of  independent  random  variables,  and 
(c)  a  Markov  chain.  Assuming  that  the  distribution 
of  daily  rainfall  amounts  is  exponential,  the  ex- 
pressions for  the  distribution  function  of  S{n)  are 
developed  for  case  a  and  case  c  and  a  numerical 
example  is  given.  For  this  case,  the  Markov  chain- 
exponential  model  is  superior  to  the  binomial- 
exponential  model. 

The  parameters  in  the  distribution  function  can 
be  readily  interpreted  in  terms  of  either  the  length 
of  period,  the  nature  of  the  counting  process  for 
the  number  of  rainy  days,  or  the  distribution  of 
daily  rainfall  amounts.  This  structure  seems  pre- 
ferable to  previously  used  models  in  which  param- 
eters may  be  correlated  with  the  length  of  period. 

The  Markov  chain-exponential  model  is  suffi- 
ciently promising  to  warrant  testing  in  several 
climatic  regions. 


Appendix  A 

Program  for  Computing  the  Distribution 
Function  for  the  Markov  Chain-expo- 
nential Model  (equation  4.18) 

The  following  program  is  written  in  FORTRAN 
IV.  The  input  parameters  are  explained  in  the 
comments  section.  The  variable  SBAR  controls 
the  interval  at  which  Fn{x)  is  computed  (AZ  = 
SBARI5.).  If  we  assume  daily  rainfall  amounts  to 
be  exponentially  distributed,  the  expression 
P{Xi,  ^  x}  in  equation  4.18  can  be  written  as 

P{X.  ^x}=  7-^^  "-'-'e-^''du. 
[v  —  1)1  Jo 

In  the  program,  the  above  integral  is  obtained  by 
numerical  integration  using  a  trapezoidal  approxima- 
tion with  increments,  Au,  set  at  x/50. 


PROGRAM    MCEX    FORTRAN  EXTENDED  VERSION  2 . 0       09/10/71  13.12.06. 

PROGRAM  MCEX ( INPUT , OUTPUT , TAPE5=INPUT . TAPE6=0UTPUT ) 
C  THIS  PROGRAM  COMPUTES  THE  CDF  FOR  TOTAL  RAINFALL  FOR  N  DAYS.  INPUT 
C  PARAMETERS  ARE    N,QO=P(DAY  I  IS  WET  GIVEN  DAY  I-l  IS  DRY) ,  QI=P(DAY 
C  I  IS  WET  GIVEN  DAY  I-l  IS  WET)  ,  XLAM  IS  PARAMETER  IN  NEG .  EXPONENTIAL 
C  DISTRIBUTION,  R  IS  P(DAY  BEFORE  SERIES  BEGINS  IS  WET),  SBAR=APPROXIMATE 
C  VALUE  OF  MEAN  PRECIPITATION  FOR  N  DAY  PERIOD 

DIMENSION  PSI0(50) ,PSII(50) 
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15  READ ( 5 , 18 ) N . QO . QI . R , XLAM , SBAR 
IF(E0F(5) )17.19 

17  STOP 

18  F0RMAT(I5,5F10.5) 

19  WRITE  ( 6  ,  20  )  N  ,  QO  ,  QI ,  R ,  XLAM 

20  FORMAT (//10X.*N=», 13,*  Q0=* , F5  4 , ♦  QI=*,F5.4,*  R=*,F5.4. 
1»    LAMDA=» , F5 . 2 ) 

WRITE (6, 23) 
23  FORMAT ( /lOX, •    X       FN(Xj  •) 


0       0.4     0.8      1.2      1.6      2.0      2.4     2.8  3.2 

a.  First  10  Days  in  May  (  X  «  1.753  ) 


10 
08 

06 
0.4 
0.2 
0 


3.6  4.0 


04     08      1.2       1.6      2.0     2.4     2.8      3.2  36 

b.  Second  10  Days  in  May   (X»  1.600) 


40 


A 

-  / 

f  1  1 

1  1 

z — i  V  1 — —  •  

1           ;           !           1           :  I 

0 


3.6  40 


0.4     0.8      1.2       1.6      20      24     2  8  32 

c.  Third  10  Days  in  May   (  X«  1.710  ) 

FiGl'RE  4.  — Observed  and  theoretical  (exponential)  distributions  of  d.uiv  values  of  precipitation. 
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X 


Figure  5. —  Observed  and  computed  distribution  functions;  binomial-exponential  model. 


X 


Figure  6.  — Observed  and  computed  distribution  functions;  Markov  Chain-exponential  model. 
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C  THIS  SECTION  COMPUTES  THE  CONDITIONAL  COUNTING  PROCESS  DENSITY  FUNCTIONS 
C        PSIO(I) ,PSII(I)    WHERE  I=NU+1 

PSII(1)=  (l.-QO)**(N-l)*(l.-QI) 

PSI0(1)=  (l.-QO)**N 

NU=0 

NT=N+1 

DO  72  1=2, N 

NU=NU+1 

NCI=  IFIX(  N+0.5  -  ABS(2*NU-N  +0.5)  +0.01) 

NCO=  IFIX(  N+0.5  -  ABS(2*NU-N  -0.5)  +0.01) 

NC=0 

A=0.0 

B=1.0 

NSW=1 

SUMI=0 .0 

SUMO=0 . 0 

TERMI=(1 .-QI)/{1 .-QO) 
TERMO=QO/QI 
52  SUMI=SUMI  +  TERMI 
SUMO=SUMO  +  TERMO 
NC=NC+1 

IF(NC.EQ.NC0)56,70 
56  PSIO(I)=  QI**NU*(1.-Q0)*»(N-NU)*SUMG 
IF(NC0.GT.NCI)72.58 

58  IF(NC.EQ.NCI)71.59 

59  GO  TO (60, 75) ,NSW 

60  NSW=2 
A=A+1 . 0 

TERMI  =TERMI*(NU-A+1)/A*Q0/QI 

TERMO  =TERMO* ( N-NU-A+1 ) /A* ( 1 . -QI ) / ( 1 . -QO ) 

GO  TO  85 

70  IF  (NC.EQ.NCI)71 ,59 

71  PSII(I)=QI*»NU*(1 .-QO)**(N-NU)*SUMI 
IF(NC0.GT.NCI)59,72 

72  CONTINUE 
GO  TO  85 

75  NSW=1  • 
8=8+1 

TERMI=TERMI*(N-NU-B+1 .  )/(B— 1.  )  *  ( 1  .  — QI ) /I  ■  — QO ) 
TERMO=TERMO* ( NU-B+1 . ) / ( B-1 . ) *QO/QI 
GO  TO  52 
85  PSII(NT)=QI»*N 

PSIO(NT)=QI*»N»QO/QI 
WRITE ( 6 , 90 ) ( PSIO ( 1 1  , 1=1 . N ) 
C  THIS  SECTION  COMPUTES  THE  CDF  OF  THE  SUM  OF  DAILY  RAINFALLS  FOR  N  DAYS 
NCOUNT=0 
X=0.0 

FNX=( (l.-QO)-R*(QI-QO) ) • ( 1 • -QO ) ** ( N-1 ) 

104  WRITE(6. 105)X.FNX 

105  F0RMAT(10X,2F10  6) 
X=X+  SBAR/5 

IF(FNX  GT.0.99.0R.NC0UNT.GT.100)G0  TO  15 

SUM=0 . 0 

UINT=0  0 

C0EF=XLAM 

U=0.0 

DU=X/50 . 
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NU=1 

NC0UNT=NC0UNT+1 

115  IF  (NU.EQ. 1)116, 118 

116  FU1=1.0 
GO  TO  119 

118  FU1=0.0 

119  U=U+DU 

FU2=U** ( NU-1 ) *EXP ( -XLAM*U ) 
UINT=UINT  +  (FU1+FU2)*DU*0 .5 
IF(U.GE.  X-0.01*DU)130,126 
126  FU1=FU2 
GO  TO  119 

130  TERM=(R*PSII(NU+1)  +  ( 1 . -R ) ♦PSIO ( NU+1 ) ) *COEF*UINT 

SUM=SUM  +  TERM 

IF(NU.EQ.N)135,140 
135  FNX=( (l.-QO)-R»(QI-QO) )*(1.-Q0)**(N-1)  +  SUM 

GO  TO  104 
140  NU=NU+1 

C0EF=C0EF»XLAM/ ( NU-1 ) 

U=0.0 

UINT=0 . 0 

GO  TO  115 

END 


Appendix  B 

List  of  Symbols 

7},,  is  a  random  variable  equal  to  1  if  vth  day 

is  wet  and  equal  to  0  if  vth  day  is  dry. 

A',,  is  the  number  of  wet  days  in  the  n-day 

period. 

is  the  amount  of  precipitation  of  vth  rainy 
day  in  the  n-day  period. 
S(n)        is  the  total  amount  of  precipitation  of  the 
«-day  period. 

i,v  is  the  minimum  number  of  days  in  which  v 

rainy  days  occur  {t,„=i>,  v  +  2,.  .  .). 

x(«)  is  the  minimum  number  of  days  (the 
shortest  period)  whose  corresponding 
total  amount  of  precipitation  exceeds  a 
value  u. 

Ja{x)       is  the  indicator  of  the  set /4. 
Xk  is  the  total  amount  of  precipitation  of  k 

wet  days. 


E{X}       is  the  mathematical  expectation  of  the 
random  variable  X. 
denotes  variance  o(X. 
is  the  distribution  function  of  S  ( n ) . 
is  the  conditional  probability  that  iA.h  day 
is  wet  provided  (j^— l)th  day  was  dry. 
is  the   conditional  probability  that  i^th 
day  is  wet  provided  (t-  — l)th  day  was  wet. 
represents  the  Gamma  function, 
represents   the   probability  that  in  the 
n-day  period  v  days  will  be  wet  provided 
the  day  preceding  the  «-day  period  was 
dry. 

<P\(v,  n)  represents  the  conditional  probability  of 
having  v  wet  days  in  the  n-day  period 
provided  the  day  preceding  the  n-day 
period  was  wet. 

R  P{n„=l} 

d  Qi—Qo 


D{X} 
F„{X) 
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AN  EVENT-BASED  STOCHASTIC  MODEL  OF  AREAL  RAINFALL 

AND  RUNOFF 


By  M.  M.  Fogel,  L.  Duckstein,  and  J.  L.  Sanders  ' 


Abstract 

A  stochastic  model  of  rainfall  is  developed  for  the 
purpose  of  obtaining  runoff  sequences  on  water- 
sheds where  the  maximum  events  are  most  likely 
to  be  the  result  of  thunderstorms.  Runoff-producing 
rainfall  events  for  a  watershed  are  defined  as  a 
function  of  minimum  levels  of  point  measurements. 
The  causal  factors  of  a  point  measurement,  given 
the  occurrence  of  an  event  over  a  watershed,  are 
the  magnitude,  location,  and  pattern  of  that  areal 
event.  It  is  found  that  the  probability  distribution 
function  of  point  rainfall  is  geometric.  Random 
variables  representing  precipitation  at  several  points 
are  then  convoluted  to  yield  mean  areal  values  of 
rainfall  whose  distribution  is  negative  binomial. 
Chicago  and  New  Orleans  data  seem  to  validate  the 
model  developed  in  Tucson. 

A  table  describing  seasonal  rainfall  combinations 
is  used  to  generate  possible  annual  sets  of  events. 
Corresponding  likely  sets  of  runoff  amounts  are 
then  obtained  using  both  a  simple  linear  rainfall- 
runoff  relationship  and  the  Soil  Conservation 
Service  formula  for  estimating  runoff. 

A  study  of  the  frequency  distribution  of  annual 
number  and  duration  of  runoff  events  using  an  inde- 
pendent set  of  data  provides  a  further  validation  of 
the  proposed  model. 

There  clearly  is  a  limit  to  the  advances  that  can 
be  made  in  engineering  hydrology  by  considering 
watershed  runoff  as  a  stochastic  process.  When 
runoff  variability  is  large,  more  realistic  streamHow 
volume  models  can  be  constructed  by  modelling 
precipitation  as  a  stochastic  process  and  routing 
stochastically  generated  precipitation  through  a 
deterministic  watershed  model  thus  more  accurately 


'  Professor  of  watershed  management  and  Professors  of  Sys- 
tems Engineerinf;,  respectively,  I'niversity  of  Arizona.  Tucson 
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reflecting  the  complex  interactions  that  occur  within 
the  watershed. 


Introduction 

Effective  management  of  water  resources  is 
dependent  on  rainfall-runoff-quality  relationships 
and  on  a  set  of  inputs  adequately  represented  in 
the  time-space  domain.  An  important  input  in  many 
hydrologic  systems  is  the  short-duration,  high- 
intensity  localized  thunderstorm  rainfaD  that 
produces  most  of  the  floods  on  urban  and  small 
rural  watersheds. 

This  paper  is  concerned  with  the  frequency  of 
occurrence  of  storm  rainfall  rather  than  the  vari- 
ation of  rainfall  intensity  within  a  storm.  Wiesner 
{15)  has  estimated  that  a  30-  to  50-year  record  is 
required  to  obtain  stable  precipitation  frequency 
distributions  for  most  regions.  Thus,  there  is  need 
to  develop  models  which  can  be  calibrated  with 
only  a  few  years  of  data  and  which  can  be  ex- 
trapolated to  ungaged  watersheds. 

For  reasons  of  convenience  or  tradition,  hydro- 
logic  data  are  taken  at  or  averaged  over  equally 
spaced  time  intervals.  While  such  information 
constitutes  useful  data,  especially  for  widespread 
cyclonic  storms  or  perennial  stream-flows,  it  may 
be  of  limited  value  for  summer-type  precipitation 
or  for  intermittent  flow  due  principally  to  limited 
sample  size.  It  appears  that  an  approach  based  on 
naturally  occurring'  events  is  warranted  whenever 
well-defined  events,  such  as  point  precipitation 
greater  than  2  inches,  occur  relatively  infrequently 
so  that  their  effects  are  usually  separated  by  a 
time  interval  that  contains  the  "null  event."  Most 
hydrologic  phenomena  in  semiarid  countries  are  of 
this  type  as  are  flood  occurrences  in  any  climatic 
region  (/<)). 

The  design  of  flood  control  structures  depends 
upoit  teliable  forecasting  of  runoft".  which  in  turn 
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depends  upon  a  correct  identification  of  rainfall 
inputs.  In  this  paper,  input  into  a  given  watershed 
will  be  defined  by  event-based  stochastic  models 
from  which  runoff  events  can  be  generated  using 
known  and  simple  rainfall-runoff  relationships. 

Basic  Rainfall  Probability  Model 

Event-based  models  are  characterized  by  at 
least  two  random  variables  and  their  distribution. 
First,  there  is  the  random  number  of  events  per 
unit  of  time  (season  or  year)  designated  as  N.  An 
alternate  choice  would  be  the  interarrival  time 
between  events.  Then  there  are  one  or  more  of  the 
random  variables  of  interest  such  as  the  depth  of 
rainfall  R,  the  runoff  volume  Q_,  the  peak  rate  of 
runoff  OP,  the  duration  of  runoff  DR.  and  so  on.  If 
more  than  one  such  random  variable  is  of  interest, 
then  a  joint  distribution  would  be  needed. 

Number  of  Events  Per  Season 

Thunderstorms  seem  to  occur  in  an  independent 
manner  in  time  and  space  such  that  the  number  of 
rainfall  events  per  season  iV  follows  a  Poisson  proba- 
bility mass  function  (3).  A  similar  result  was  ob- 
tained in  a  study  by  Todorovic  and  Yevdjevich  (14). 
While  a  15-year  record  is  hardly  sufficient  to  verify 
a  particular  distribution  for  an  annual  event,  data 
collected  from  the  Atterbury  Experimental  Water- 
shed in  Tucson,  Ariz.,  indicated  a  definite  trend 
towards  a  Poisson  variate  for  the  number  of  storm 
events  per  season.  Thus,  the  probability  mass  func- 
tion for  TV  can  be  written  as 

Mi)-——      7  =  0,1,2,  ...  (1) 

where  m  is  the  mean  number  of  events  per  season 
which  has,  for  later  use,  a  moment  generating 
function 

MAf(s)  =e-'"+'"«.  (2) 

Further  evidence  justifying  the  Poisson  distribu- 
tion was  obtained  in  an  analysis  of  the  number  of 
summer  runoff  events  for  both  small  urban  water- 
shed {13)  and  a  larger  combination  urban-rural 
watershed  (i). 


Depth  of  Point  Rainfall 

As  shown  in  an  earlier  paper  (7),  a  geometric 
probabihty  mass  function  for  point  rainfall  depths 
was  derived  from  an  analysis  of  the  dense  network 
of  rain  gages  at  the  aforementioned  Atterbury 
watershed.  Thus,  for  j  units  of  rain,  the  distribution 
of  the  depth  of  rainfall  R  is 

/«(;•)  =  (l-p)/>^-       y  =  0,  1,2,  ..  .  (3) 

where  p  is  the  probability  of  success  of  the  point  in 
question  receiving  any  amount  of  rain  above  a  given 
threshold  value  including  zero.  The  moment 
generating  function  for  equation  3  is 

^«(5)=f— ^  (4) 
1  —  ps 

while  its  continuous  equivalent  is  the  exponential 
probability  density  function 

fHix)  =  ue-''^       x>0.  (5) 

Maximal  Distribution  of  Point  Rainfall 

If  there  is  only  one  event  per  season,  the  proba- 
bility of  receiving  a  given  amount  of  rain  or  more  is 
obtained  through  summation  of  the  geometric 
distribution  of  R,  equation  3.  Fogel  and  Duck- 
stein  (7)  have  shown  that  for  a  random  number  iV 
events  per  year,  the  maximal  distribution  function  is 

J 

in  which  Fnik)  is  the  cumulative  distribution 
function  for  point  rainfall  R  given  the  occurrence  of 
an  event. 

Extension  of  Basic  Model 

The  basic  rainfall  probability  model  (equations  1, 
3,  and  6)  deals  only  with  rainfall  at  a  point  in  space. 
Based  on  this  model,  the  development  of  other 
models  is  presented  in  this  section.  For  hydrologic 
models  using  lumped  parameters,  the  average 
storm  rainfall  over  a  particular  watershed  is  often 
required.  In  other  instances,  the  total  seasonal 
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precipitation  may  be  of  importance.  Finally,  for 
those  situations  where  antecedent  rainfall  is  needed, 
the  generation  of  a  time  series  of  events  will  also 
be  discussed. 

Distribution  of  Areal  Rainfall 

Consider  that  a  thunderstorm  system  containing 
several  cells  is  present  over  a  watershed  during  the 
period  considered,  say  24  hours.  If  the  measuring 
points,  or  rain  gages,  are  sufficiently  far  apart  so 
that  the  measurements  at  any  two  gages  come 
almost  always  from  two  different  cloudbursts,  it 
can  be  said  that  the  system  has  spatial  inde- 
pendence and  temporal  dependence. 

For  a  given  storm,  the  areal  rainfall  B,  or  the 
mean  rainfall  over  a  given  area,  may  be  considered 
as  the  average  of  a  fixed  number  of  n  point  rainfall 
measurements  ^i,  ^2,  .  .  .,  Rn.  Since  the  average 
amount  of  rainfall  for  the  n  gages  differs  from  the 
sum  of  the  n  gages  only  by  a  constant  scaling  factor, 
it  is  possible  to  use  this  latter  value  in  developing  a 
probability  mass  function  for  areal  rainfall. 

If  the  point  rainfall  variates  ^1.  R-2,  ■  ■  .,  Kn  are 
mutually  independent  with  an  identical  probability 
mass  function  (equation  3),  then  the  areal  rainfall 
B  has  a  negative  binomial  distribution  with  param- 
eters p  and  r  —  n 


(7) 


The  above  distribution  is  simply  the  r-fold  con- 
volution of  the  geometric  distribution  (equation  3) 
as  seen  by  its  moment  generating  function 


Mais) 


1 


1 


ps 


(8) 


To  test  the  validity  of  the  negative  binominal 
and  the  Poisson  distributions  for  use  in  equation  7, 
data  from  two  metropolitan  areas.  Chicago  and  New 
Orleans,  were  examined.  Warm  season  precipita- 
tion in  both  of  these  areas  is  predominantly  of  the 
thunderstorm  variety.  To  obtain  the  necessary 
parameters  for  the  areal  rainfall  model  required 
that  a  storm  be  defined  in  such  a  manner  that 
only  the  thunderstorm  rainfall  data  would  be 
extracted   from  the  available  daily  precipitation 


records.  After  investigating  several  definitions  of 
the  intense  runoff-producing  thunderstorm,  one 
similar  to  that  used  by  Huff  (10,  11}  in  a  study  of 
heavy,  warm-season  rainfall  in  Illinois  was  selected. 
Using  daily  precipitation  records,  a  storm  was 
defined  as  one  in  which  the  network  mean  was 
greater  than  0.5  inch  and  at  least  one  gage  recorded 
more  than  1.0  inch.  Five  rain  gages  located  within 
a  50-square-mile  area  in  each  city  with  a  historical 
record  of  at  least  15  years  were  selected  for  the  test. 

Figures  1  and  2  illustrate  the  distribution  of  A  the 
number  of  annual  occurrences  of  events  as  defined 
above.  For  comparison,  a  Poisson  distribution  with 
the  parameter  m  set  equal  to  the  mean  number  of 
events  is  also  shown.  Of  interest  is  that  the  Poisson 
distribution  could  not  be  rejected  for  a  variety  of 
storm  definitions  (9). 

Next,  a  procedure  for  obtaining  the  parameters 
p  and  r  of  the  negative  binomial  distribution  was 
investigated.  For  Chicago,  the  parameter  p  was 
determined  from  the  records  of  one  of  the  five 
stations  and  r  was  assumed  to  be  equal  to  n,  the 
number  of  rain  gages.  The  results  are  illustrated 
in  figure  3.  Although  the  fit  at  low  values  was  not 
outstanding,  the  tail  did  fit  well. 

For  the  New  Orleans  data,  the  parameter  p  was 
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also  obtained  from  one  of  the  stations  (fig.  4),  but 
the  parameter  r  was  determined  by  the  method  of 
moments.  As  illustrated  in  figure  5,  the  theoretical 
and  the  empirical  distributions  are  closer  together 
than  in  the  case  of  Chicago.  The  null  hypothesis 
that  the  mean  rainfall  was  negative  binomial 
could  not  be  rejected  at  any  level  of  significance 
using  the  Kolmogorov-Smirnov  test. 

If  the  gage  measurements  are  not  spatially  in- 
dependent and  are  correlated,  then  the  parameter 
r  will  be  less  than  n,  the  number  of  gages.  At  the 
limit,  if  a  uniform  rainfall  occurs  over  the  n  gages, 
the  result  reduces  to  the  information  given  by  a 
single  point  and  hence  r—1.  The  New  Orleans 
stations,  therefore,  exhibit  some  degree  of  spatial 
dependence. 

Total  Seasonal  Rainfall 

It  is  sometimes  desirable  to  determine  the  distri- 
bution of  total  precipitation  during  a  given  time 
period.  This  is  readily  done  as  shown  in  the  follow- 
ing approach: 

Let  Z  be  the  total  number  of  units  for  a  summer 
season 

Z  =  ^,  +  «2+  .  .  .  +^,v  (9) 

where  ^i,  Rz,  .  .  .,  are  mutually  independent 
identically  distributed  random  variables.  From 
Feller  (5),  the  generating  function  of  Z  is 


Mz(5)  =  M.v[M«(5)]  =exp 


—  m  +  m 


I—  ps 


(10) 


The  probability  mass  function  of  Z  can  then  be 
obtained  from  the  above  generating  function  by 
successive  differentiations  as  follows: 


fzU)  =  - 


1  dJMzis) 


dsj 


;  =  0. 


(11) 


The  mean  or  expected  value  £'(Z)  and  the 
variance  Var  (Z)  can  be  calculated  directly  with- 
out  determining  the   probability   mass  function 


through  the  relationships 

E{Z)=E{N)E{R) 


(12) 


Var  (Z)=Var  iN)[E{R)Y'+  [E{N)]  Var  (R). 


Simulated  Series  of  Events 

It  is  evident  that  the  total  seasonal  rainfall 
Z  can  come  from  various  combinations  of  point 
rainfall  depths  per  event  R  and  the  number  of 
events  N  during  a  season.  Table  1  is  arranged  to 
produce  these  combinations  for  m  =  5.33  and 
p  =  0.48,  values  obtained  in  the  original  analysis  of 
the  Tucson  experimental  data.  The  rows  correspond 
to  the  number  of  events  per  year,  while  the  columns 
refer  to  the  total  seasonal  rainfall.  Cell  {j,  k)  thus 
represents  the  joint  probability  of  j  units  of  rain 
and  k  events  per  year.  For  example,  the  proba- 
bility of  seven  units  of  rain  occurring  in  four  events 
is 


P(Z  =  7,  N  =  4) 


:P(Z  =  7  |iV  =  4)P(/V=4) 
(0.0513)  (0.163)  =  0.00835. 


Within  each  ceU,  different  occupancy  distributions 
of  storms  are  possible.  Continuing  with  the  above 
example,  without  regard  to  the  order  in  which  the 
rains  occur,  the  seven  units  can  be  combined  into 
four  events  in  three  possible  ways,  that  is,  4-1-1-1, 
3-2-1-1,  and  2-2-2-1.  Using  classical  combina- 
torial analysis  techniques,  the  probability  of  each 
of  these  occupancies  can  be  calculated  (5). 

A  Monte  Carlo  simulation  can  now  be  set  up  to 
generate  an  unordered  succession  of  yearly  combi- 
nation of  events.  This  synthetic  rainfall  set  may  be 
useful  for  determining  the  seasonal  or  annual  water 
yield,  for  evaluating  runoff  modification  and  water 
conservation  practices  (urbanization,  water  harvest- 
ing, and  artificial  recharge)  and  to  study  separately 
the  effect  of  random  fluctuations  and  the  effect  of 
control.  In  those  instances  where  the  order  of  event 
occurrence  is  of  importance,  it  can  be  incorporated 
into  the  simulation  to  generate  a  time  series  of 
events.  Likewise,  a  time  series  of  runoff  events  can 
be  generated  by  transforming  rainfall  into  runoff, 
the  subject  of  the  next  section. 
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Figure  2.  — Distribution  of  occurrences  of  warm-season  rainfall  in  which  the  areal  mean  of  five  gages  in  New  Orleans,  La.,  exceeded 

0.50  inch  and  at  least  one  gage  recorded  more  than  1.0  inch. 


Conversion  of  Rainfall  Into  Runoff 

To  obtain  the  runoff  volume  Q  from  a  rainfall 
event,  two  models  are  considered,  first  a  linear 
model  with  a  random  proportionality  factor  and  then 
the  U.S.  Soil  Conservation  Service  (SCS)  formula 
with  constant  parameters.  Using  these  models, 
the  seasonal  water  yield  is  calculated  and  the 
extreme  event  is  predicted. 

Storm  Runoff  Per  Event 

In  a  previous  paper  (8),  it  was  shown  that  for 
small  watersheds  in  the  southwestern  United 
States,  the  following  formula  yielded  good  results 
for  determining  runoff  from  thunderstorm  rainfall: 


Q  =  C(R-A) 


where  A  is  composed  of  the  initial  abstractions  from 
the  rainfall  (interception  and  depression  storage) 
and  C  is  a  function  of  rainfall  characteristics  for  a 
given  watershed,  namely,  a  time  distribution  factor 
such  as  the  maximum  15-minute  intensity. 

With  A  assumed  to  be  a  constant  for  a  given 
watershed,  effective  rainfall  can  be  defined  as 


P  =  R-A 


for  R>A 

and  0  otherwise 


This  leads  to 


0  =  CP. 


(14) 


(15) 


(13) 


Based  on  data  presented  in  the  reference  just 
cited  (<S'),  the  random  variable  C  is  assmued  to 
have  a  gamma  probability  density  finu  tioii 
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/r(a,  6,x) 


r(a) 


l)a^a~lg-bx         x>0.  (16) 


The  mean  and  variance  of  P  can  be  found  by  first 
obtaining  the  A;th  moment  of  P  as  follows: 


It  is  further  assumed  that  the  rainfall  depth  R  and 
the  coefficient  C  are  statistically  independent. 

As  previously  mentioned,  the  continuous  equiva- 
lent of  the  geometric  distribution  for  rainfall 
depths  R  is  the  exponential  probability  density 
function 

fR(x)  =  ae-"-^       x>0.  (5) 

An  approximate  value  of  the  parameter  u  can  be 
obtained  by  equating  the  means  of  the  geometric 

and  the  exponential  functions.  This  gives  u=- —  1 . 

P 

Using  equations  5  and  14,  the  cumulative  distribu- 
tion function  of  P  is  found  to  be 


Fp(:t)  =  l-e-«<-^+'4*  ifx^O 


(17) 


=  0 


if  :c  <  0. 


E{P)'c=  r  iO)f,{x)dx+  (x-AYfH{x)dx 

Jo  J  A 


(18) 


From  equation  18,  the  mean  and  variance  are 


E{P) 


Var  iP)=E{P^)-[E{P)y^ 


2g-«'l  g-2uA 


(19) 


with 


u  =  1. 

P 
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Figure  4.  —  Distribution  of  warm-season  rainfall  depths  per  rainy 
day  in  New  Orleans,  La. 


—  I  (a)  Jo 


(22) 


Numerical  methods  are  needed  to  compute  this 
function  whose  usefulness  will  depend  on  the  ac- 
curacy of  estimating  the  parameters  a,  b,  p,  and  A. 

An  alternate  method  of  obtaining  the  probability 
density  function  of  Q  is  examined  next.  The  SCS 
has  established  an  empirical  rainfall-runoff  rela- 
tion ship  whose  coefficients  depend  on  the  hydro- 
logic-soil-cover  complex  of  a  watershed  (72).  This 
relationship  can  be  written  as 


(R-A] 


{R-A)+S 


(23) 


where  A  represents,  as  before,  the  initial  abstrac- 
tions and  S  is  a  watershed  factor.  Using  the  defini- 
tion for  effective  rainfall  as  previously  defined 
(equation  14),  results  in 


p2 

P  +  S' 


(24) 


In  this  case,  the  cumulative  distribution  function 
of  Q  is  obtained  by  a  classical  transformation  of 
random  variables  (2).  Letting  y  represent  Q  and 
X  represent  P,  then 


Thus,  with  the  assumption  that  C  and  P  are  with 
independent,  the  mean  and  variance  of  Q  are 

y~   (25) 

readily  calculated  from  the  following:  x  +  S 

or 


E{Q)=E{C)E{P)  :«:=i(y+ V/  +  45y). 

(20) 

Var((?)=Var(C)[^(P)]2+[£(C)]2  Var(P).  5.        «r  .  x  ■      •       ,  -.^  •  oc 

—  —       —         L  _\   >  Smce  Fp{x)  is  given  by  equation  17,  equation  25 

can  be  rewritten  as: 


If  the  cumulative  distribution  function  of 
Q  —  CP  is  deemed  necessary,  it  may  be  obtained  by 
randomizing  C  in  a  manner  described  by  Feller  (6) 
which  results  in  with 


^«(y)  =  l~exp 


-'^(^y  +  2A  +  V/  +  45y^ 


'(26) 


FQiy)-j^^Fp(^^yc{x)dx  (21)  "  p 


'  1. 


Using  equations  16  and  17  yields 


For  the  purpose  of  illustration,  the  cumulative  dis- 
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111^.  I  a. 


tribiition  function  (ecination  26).  wliicli  is  cniidi- 
tioned  on  the  occurrence  of  an  event,  i.s  sliouii  in 
figure  6  for  a  fiiven  watershed  in  which  /  O.fi. 
5=3.0.  and  for  the  rainfall  paranieter  /»  e(iiial  to 
0.35.  0.40,  and  0.45. 

Seasoridl  If  (iter  Yield 

From  the  records  of  warm  season  raintall.  it  is 
possible  to  describe  the  random  varial)l(>  /\ .  the 
the  depth  of  i)oint  raiidall,  bv  a  discrete  i;eometri( 
distribution  (ei|iiation  3)  or  by  a  contintions  negative 
exponential  distribution  (e(piation  S).  I'hen  with 
a  known  or  assumed  vahu'  ior  /.  the  rainiairs 
initial  abstractions.  tht>  cumidative  distribution 
function  for  /'  (e(piation  17)  can  hv  (Evaluated.  \ 
simple  procedure  can  now  be  used  to  obtain  the 
total  runotT  voliune  //    from  the  summer  raiidall 


events.  The  tol low  iiii:  procedure  out  lino  tlii.^  mcth.M 
1.    Obtain    a    simulated    -ct    of   [lomt  laiiit.ill 
A',.  A',  A^ 

Ll.    Transform  the  \  A'  "s  into  Ml''-  -iicli  iImi 
(J,     0  if  A'  ,  --  / 
/',     A      I  if  A     ■  / 

3.  I  sini:  either  the  linear  itUMh'l  i  (-. | u.it ion  1 
o\  the  S(  S  tormula  icipiation  4  i  tor  km.Uuil:  l  un. 
to  rainfall,  obtain  the  S(M,  (J; .     ■  , 

4.  I'hen.  the  total  \s.itcr  mcM  tor  ihc  ^c.i-.'U 
'/  ■  (J:  ■     .    .    .  -  (J. 

.>.    Ihc  mc.iii  and  \aii,incc  ot  lhi<  ^iiir.ii.itc.l 
ot  I  iinod  c\  cuts  IS  i:i\  en  b\ 
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EiK)=EiM)  E{Q) 

(27) 

Var  {W)=Yax  (M)     (2)]'  +  Var  {Q)iE{M)]. 

Prediction  of  Extreme  Events 

With  the  probability  mass  function  of  areal  rain- 
fall B_  given  by  equation  7,  the  cumulative  distribu- 
tion function  is  simply 

FB{k)^^(^'^{~^yi-p)rpj    y=o,  1,2,  .  .  . 

(28) 

This  is  conditioned  on  the  occurrence  of  an  event 
as  previously  defined,  such  as  the  rain  gage  network 
mean  greater  than  0.5  inches  and  at  least  one  gage 
greater  than  1.0  inch.  A  storm  definition  of  this  type 
serves  two  purposes.  First,  it  tends  to  exclude  all 


rainfall  other  than  thunderstorm  rainfall  and,  second, 
only  runoff-producing  rainfaE  is  considered. 

To  obtain  the  distribution  function  of  the  maximum 
volume  of  runoff  for  a  season,  c/)«(j),  called  the 
maximal  or  extremal  distribution,  two  methods 
can  be  used. 

The  first  method  consists  of  using  the  Monte 
Carlo  simulation  to  generate  seasonal  areal  rainfall 
sets  in  a  manner  similar  to  which  the  point  rainfall 
sets  were  generated.  From  these  sets,  runoff  values 
are  computed  using  either  one  of  the  two  rainfall- 
runoff  relationships.  The  relative  frequency  of 
yearly  floods  of  magnitude  yis  then  readily  calculated 
from  the  runoff  sets.  By  definition,  the  cumulative 
frequency  of  annual  floods  equal  to  or  less  than 
y  units  is  <^y(y).  The  return  period  in  years  for  a 
flood  of  magnitude  y  is  simply  the  reciprocal  of  1  — 

</>Q(y)- 


Figure  6.  — Distribution  function  of  storm  runoff  volumes  per  event  using  the  SCS  formula  for  a  given  watershed  (A  =  0.6  and  S  =  3.0) 
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The  second  method  is  analytic  and  uses  the 
probability  mass  function  F\i{i)  of  the  yearly 
number  of  runoff-producing  events  for  a  particular 
watershed.  For  the  case  of  areal  rainfall  fi,  the 
cumulative  distribution  function  is  given  by  the 
equation 

4>B{k)  =  ^[Fs(k)VfM(j)       7  =  0.1,2,  .  .  . 

(29) 

If  it  is  assumed  that  M  is  a  Poisson  variate  with  a 
mean  m,  then  equation  29  can  be  written  as 

<|)B(yc)  =  exp  (-m[l-i^B(A)]).  (30) 

8t  


CD 

LU 
X 


50      .70     .80         90         95     97    .98        99  .995 


For  illustrative  purposes,  equation  30  is  plotted 
in  figure  7  for  /n=  10  and  r=3. 

In  a  similar  fashion,  the  maximal  distribution 
function  of  seasonal  runoff  <^>y(y)  may  be  found. 
With  the  SCS  formula  for  converting  rainfall  to 
runoff,  the  cumulative  distribution  of  runoff  volumes 
Faiy)  is  determined  from  equation  26.  Then, 
together  with  the  Poisson  distribution  for  the 
annual  number  of  events,  the  maximal  distribution 
of  annual  runoff  events  is  obtained  from  the  equation 

</>«(y)-exp(-m[l-F«(y)]).  (31) 
Figure  8  is  a  plot  of  equation  31  for  m=10. 


PROBABILITY    RAINFALL  NOT 
EXCEEDING    k  INCHES 

FuilJRE  7.  — Maximal  distribution  fuiu  tioii  ol  areal  mean  rainfall  lor  r  =  ?,  and  m  =  10. 
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A  =  0.6,  S  =  3.0,  and  for  p  =  0.35,  0.40,  and  0.45. 
One  of  the  problems  that  may  develop  with  this 
procedure  concerns  reproduction  of  the  extreme 
or  rare  event.  Parameter  estimation  is  part  of  the 
problem  but  so  is  selection  of  the  distribution  func- 
tion itself.  Many  of  the  hydrologic  processes  have 
exceedingly  long  tails  that  are  difficult  to  charac- 
terize with  the  more  commonly  used  distribution 
functions.  A  possible  solution  to  this  dilemma  is 
the  use  of  mixed  distributions.  For  example,  the 
geometric  distribution  has  been  selected  to  repre- 
sent point  rainfall  depths,  R.  In  some  instances,  a 
better  fit  has  been  obtained  by  modifying  the  tail 
of  this  distribution  through  the  addition  of  a  uni- 
form distribution.  This  mixed  distribution  can  be 


represented  as  the  weighted  sum  of  two  distribu- 
tions, one  discrete  and  one  continuous,  as  follows: 


FH{x)=aGH{x)  +  (l-a)HH{x) 


(32) 


where  a  is  a  weighting  factor  that  ranges  between 
0  and  1,  Gnix)  is  the  geometric  distribution  and 
Hr{x)  is  the  uniform  distribution.  For  thunder- 
storm rainfall,  a  is  generally  greater  than  0.90. 

Another  way  to  modify  the  geometric  distribution 
is  to  form  a  new  compound  distribution  by  treating 
the  parameter  p  as  a  random  variable.  Assume 
that  p  has  a  beta  distribution 
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Figure  8.  — Maximal  distribution  function  of  runoff  volume  for  m  =       A  =  0.6,  and  S  =  3.0. 
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/p(jt)  =  7x^^(1 -a;)  1--^       O^jt,  ^x^;c2^  1  (33) 

where  T  is  a  constant  such  that  the  above  equation 
is  a  probability  distribution  function.  For  simplicity 
sake,  let  a:i  =  0  and  X2=l,  so  that  T=6.  Thus, 
equation  33  becomes 

J],ix)=6{x-x')  (34) 

which  can  be  truncated  if  necessary. 

The  probability  mass  function  of  R  is  now 

/«(>)=  ['  (]-x)xjf,Ax)dx.  (35) 

-'0 

Reducing  this  results  in 

A0>=^j7^y^-~^  (36) 

which  is  a  distribution  with  a  long  tail.  In  a  similar 
fashion,  the  distribution  of  areal  rainfall  B  can  be 
obtained  by  randomizing  p.  provided  that  the  gages 
are  spatially  independent. 

Discussion  and  Conclusions 

The  proposed  models  have  several  shortcomings. 
Independence  assumptions  have  been  made  be- 
tween events,  between  the  number  and  magnitude 
of  events,  as  well  as  within  an  event  by  assuming 
independence  of  time  distribution  and  total  amount 
of  storm  rainfall.  Distribution  functions  have  been 
hypothesized,  and  the  effects  of  parameter  uncer- 
tainty on  the  results  have  not  been  thoroughly 
assessed.  In  order  to  do  the  latter,  a  managerial 
goal  must  be  defined  together  with  the  notion  of 
economic  risk.  Then,  the  effect  of  parameter 
uncertainty  can  be  ascertained  using  Bayesian 
Decision  Theory  (4). 

Another  restrictive  aspect  of  this  paper  is  that 
only  one  type  of  rainfall  has  been  considered, 
namely,  the  thunderstorm  variety,  which  occurs 
primarily  during  the  summer.  Winter  precipitation 
exhibits  different  characteristics,  such  as  lower 
intensity,  longer  duration,  more  uniform  areal 
distribution,  and  some  persistence  from  one  event 
to  the  next.  Ultimately,  if  and  when  validat«-(l  models 


of  summer  and  winter  runoff  become  available, 
the  annual  yield  of  water  can  be  obtained  by  adding 
the  two  corresponding  random  variables  ( 13 ). 

On  the  other  hand,  substantiating  evidence  on 
the  validity  of  the  models  has  been  found  in  a  study 
of  the  frequency  distribution  of  annual  number 
and  duration  of  events  in  which  an  independent 
set  of  data  was  used  (13). 

In  conclusion,  the  following  points  have  been 
demonstrated: 

1.  An  event-based  description  of  thunderstorm 
rainfall  is  preferable  to  an  equispaced  one  because 
of  the  temporally  independent,  infrequent,  ir- 
regularly spaced  occurrences  of  this  type  of  pre- 
cipitation. 

2.  Whether  in  Tucson,  Chicago,  or  New  Orleans, 
the  following  distribution  functions  cannot  be  re- 
jected for  warm  season  rainfall: 

(a)  Poisson,   for  the  number  of  events  per 
season. 

(b)  Geometric,  for  the  depth  of  rainfall  at  a 
point. 

(c)  Negative  binomial,  for  the  average  depth  of 
rainfall. 

As  a  result,  three  parameters  at  most  are  necessary 
for  obtaining  the  maximal  distribution  function. 

3.  Monte  Carlo  simulation  can  be  performed  to 
obtain  seasonal  sets  of  rainfall  events. 

4.  Randomizing  the  multi|)licative  constant  in  a 
linear  rainfall-runoff  relationship,  accounts  for  a 
randomly  varying  time  factor  of  the  rainfall  hyeto- 
graph.  This  relationship  can  then  be  used  to  obtain 
the  distribution  function  for  storm  runoff  per  event. 
An  alternate  method  is  a  transformation  of  random 
variables  using  the  SCS  formula. 

5.  The  total  seasonal  water  yield  from  a  given 
watershed  can  be  computed,  either  analytically  or 
by  Monte  Carlo  simulation. 
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List  of  Symbols 

parameter  of  gamma  distribution 
initial  abstraction,  inches 
parameter  of  gamma  distribution 
areal  rainfall 

rainfall-runoff  proportionality  factor 
duration  of  flow 
expected  value  of  ( • ) 
probability  mass  (or  density)  func- 
tion   of   discrete    (or  continuous) 
random  variable  V 
cumulative  distribution  function  of 
discrete    (or   continuous)  random 
variable  V 

discrete  dummy  variables 

average    number    of   events  per 

season 

number  of  runoff-producing  rain- 
falls per  season 

moment  generating  function  of  dis- 
crete (or  continuous)  random  vari- 
able V 

number  of  rain  gages  over  area 
considered 

number  of  rainfall  events  per  season 
parameter  of  the  geometric  distri- 
bution 

total  effective  rainfall  per  event, 
inches  iP  =  R-A) 
total  runoff  per  event,  inches 
peak  runoff  per  event 
parameter  of  negative  binomial  dis- 
tribution 

total  rainfall  per  event,  inches 

precipitation  at  point  j 

dummy    variable    for  generating 

functions 

constant  in  SCS  rainfall-runoff 
formula 

normalizing  constant  in  beta  dis- 
tribution 

parameter  of  exponential  distri- 
bution 


V  random  variable 

Var(  • )  variance  of  ( • ) 

W  seasonal  water  yield,  inches 

x,  y  continuous  dummy  variables 

Z_  total  number  of  units  of  precipita 

tion  per  season 
a  weighting  factor  in  mixed  distri 

bution 

r(  • )  gamma  function 

(t>M{  • )  maximal  (or  extremal)  distribution 

function  of  V 
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RESEARCH  IN  STOCHASTIC  MODELS  FOR  BEDLOAD  TRANSPORT 


By  C.  S.  Hung  and  H.  W.  Shen  ^ 


Abstract 

Research  in  stochastic  models  for  bedload  trans- 
port is  reviewed,  summarized,  and  critically  ex- 
amined. Discussions  are  given  for  (1)  theoretical 
models  in  deriving  the  distributions  of  the  step 
lengths  and  the  rest  periods;  (2)  theoretical  deriva- 
tion of  the  probability  density  function  of  the 
distance  at  a  given  time;  (3)  calculation  of  bed- 
load  transport  from  stochastic  models;  and  (4)  ex- 
perimental evidence  (including  recent  single 
particle  experimental  results  by  the  authors.) 

The  order  of  generality  of  all  existing  theoretical 
models  is  evaluated  and  the  relationships  among 
these  are  given. 

Introduction 

Tractive  force  acting  on  the  boundary  by  the 
flow  has  long  been  considered  the  most  important 
factor  in  determining  the  sediment  transport  rate. 
DuBoys  (3)  and  subsequently  many  others  (see 
Raudkivi  (i5),  Graf  (i)  and  Shen  {18))  have  de- 
veloped various  equations  to  relate  sediment  bed- 
load  transport  rate  with  the  tractive  force  and  the 
critical  tractive  force.  KaUnske  {12)  derived  his 
theoretical  formula  by  considering  the  influence  of 
turbulence  and  assuming  that  the  instantaneous 
sediment  particle  velocity  was  proportional  to  the 
difference  between  the  instantaneous  fluid  velocity 
at  the  particle  level  and  the  critical  fluid  velocity 
for  the  incipient  particle  motion.  All  of  these  de- 
velopments ignore  the  actual  nature  of  sediment 
bedload  movements  and  have  optimistically  assumed 
that  the  bedload  rate  can  be  grossly  described  by  a 
deterministic  function  of  certain  flow  parameters. 
Unfortunately,  after  decades  of  searching,  no 
universally  accepted  sediment  transport  equation 
has  been  found  by  this  method.  Perhaps  the  most 


'  Graduate  student  and  professor,  respectively.  Department 
of  Civil  Engineering,  Colorado  State  University,  Fort  Collins. 


promising  empirically  developed  equation  is  that 
developed  by  Shen  and  Hung  (20)  with  multiple 
regression  technique.  The  sediment  concentration 
is  found  to  be  a  function  of  the  flow  velocity,  the 
energy  slope,  and  the  fall  velocity  of  the  median 
sediment  size  of  the  bed  sample.  Although  all 
available  primary  data  agrees  with  this  relationship 
remarkably  well,  it  is  doubtful  that  this  relation- 
ship can  be  applied  to  well-graded  sediment  sizes, 
and  it  does  not  provide  much  understanding  on 
the  actual  movement  of  sediment  particles. 

The  migration  process  of  a  sediment  particle  on 
an  alluvial  bed  actually  consists  of  a  sequence  of 
random  steps  and  random  rest  periods.  Einstein 
(4)  developed  the  first  probabilistic  model  based  on 
this  concept.  Unfortunately,  his  dissertation  de- 
scribing this  model  was  written  in  German  and  was 
little  known  to  English-speaking  researchers  until 
the  early  1960's.  In  the  past  decade,  progress  in  the 
development  of  stochastic  sediment  models  has 
clearly  illustrated  the  potential  of  this  approach. 
The  purposes  of  this  paper  are  to  critically  examine 
the  following  items: 

1.  How  are  the  theories  of  the  probability, 
statistics,  and  stochastic  processes  being  used  to 
develop  various  existing  bedload  transport  models? 

2.  What  are  the  merits  and  faults  of  existing 
theoretical  models? 

3.  What  are  the  similarities  and  interrelationships 
among  the  existing  theoretical  models? 

4.  How  are  the  experiments  conducted  to  verify 
the  proposed  models  and  to  estimate  the  required 
parameters  after  the  models  have  been  accepted? 

5.  Our  findings. 

6.  Our  suggestions  for  future  research. 

The  analysis  of  this  paper  is  presented  as  follows: 

1.  Theoretical  models  in  deriving  the  distribu- 
tions of  the  step  lengths  and  the  rest  periods. 

2.  Theoretical  derivation  of  the  probability  den- 
sity function  (p.d.f.)  of  the  travel  distance  X  in 
time  t,  ft{x). 
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3.  Bedload  transport  rates. 

4.  Experimental  results  of  the  dispersion  of  a 
group  of  bedload  particles  and  their  implications. 

5.  Experimental  results  of  the  single  particle 
movement  and  their  implications. 

Theoretical  Models  in  Deriving  the 
Distributions  of  the  Step  Lengths 
and  the  Rest  Periods 


Einstein  s  /Wof/e/.  — Einstein  (4)  was.  perhaps, 
the  first  to  observe  and  verify  that  sediment  bed- 
load  particles  moved  in  a  sequence  of  steps  and 
rests.  Noting  that  the  rest  period  was  random  and 
was  usually  much  longer  than  the  movement 
periods,  he  suggested  considering  movement 
periods  negligible  in  comparison  with  the  rest 
periods  and  classified  the  motion  process  by  di- 
viding it  into  time  steps  without  an  advance  in  dis- 
tance and  distance  steps  without  and  advance  in 
distance  and  distance  steps  without  an  advance 
in  time.  The  bedload  movement  is  then  described  by 
two  random  variables,  the  step  length  and  the  rest 
period,  one  following  the  other  alternately. 

Einstein  investigated  the  particle  movement  in 
the  X  —  t  plane  on  a  Galton's  board.  Particles  move 
on  this  plane  by  making  either  jr-direction  steps  or 
^-direction  steps.  The  actual  sediment  particle 
path  line  is  a  zigzag  line  consisting  of  a  series  of 
alternate  step  lengths  and  rest  periods.  The  step 
lengths  and  the  rest  periods  are  considered  sums  of 
a  certain  number  of  A.v's  and  s.  The  particle  is 
assumed  to  be  able  to  make  "small  steps"  either 
in  distance  or  in  time  at  the  ends  of  each  Ajr  and  A/. 
Since  he  was  interested  in  the  bedload  transport  in 
a  stationary  equilibrium  flow  and  ".  .  .  all  cross- 
sections  are  equivalent  and  that  the  velocity  of 
flow  is  independent  of  time,"  Einstein  stated  that 
".  .  .  the  only  plausible  assumption  for  our  Galton's 
board  is  that  the  probability  (for  a  particle  to  make 
a  small  step  either  in  distance  or  in  time)  is 
constant." 

If  one  lets  p  be  the  probability  of  a  particle  making 
a  small  time  step  without  an  advancement  in  posi- 
tion and  q  be  the  probability  of  a  particle  making 
a  small  distance  step  without  an  advancement  in 
time  (the  duration  of  step  is  being  neglected  as 
stated  previously),  then  the  sum  of  p  and  q  is  one 


because  a  particle  can  either  take  a  time  step  or  a 
distance  step.  The  percentage  of  particles  (in  terms 
of  total  number  released  at  x  =  0)  which  stay  at 
x  —  i^x,  1  =  0,  1,  2,  .  .  .is  equivalent  to  the  prob- 
ability of  a  particle  staying  at  x  —  i^x  after  making 
a  distance  step  from  x=0  and  is  equal  to  pq\  i—  0, 

1.  2  Note  that  if  a  particle  stays  at  x=iAx 

after  making  a  distance  step,  the  particle  has  made 
i  successive  small  distance  steps  in  a  negligible  or 
very  short  time  period. 

Let  X  be  the  step  length,  it  is  geometrically 
distributed  with  the  p.m.f.  that 

P[X  =  n^x]=pq\  n  =  0,1,2   (1) 

The  mean  step  length  is  then  equal  to 


y  {i^x)  -pq'^  -  Ax 
^  p 


i=0 


Note  that 


P[X^nLx]  =  ^pqi^q" 


(2) 


(3) 


and 


P[X=  {n  +  k)Ax]=pq"*''  =  q"  ■  pq" 
=  P[X^nAx]  -PlX^k^x]. 


(4) 


Equation  4  states  that  the  percentage  of  total 
particles  which  stay  at  jr  =  (n  H-  A  )  Aj:  after  making  a 
distance  step  is  equal  to  the  percentage  of  total 
particles  which  pass  x  =  nAx  times  the  probability 
of  a  particle  making  A  more  small  successive  dis- 
tance steps  and  staying  at  (n  +  k  )\x. 

In  the  real  situation,  the  random  variable  A  is 
not  necessarily  an  integer  multiple  of  some  fixed 
smaU  distance  interval.  Ajc.  that  is.  instead  of  being 
discrete,  X  is  continuous.  Suppose  the  step  length 
has  the  p.d.f.  fx{x).  then  the  probability  of  v  being 
in  (jc,  jr  +  A.r)  is  /.v(.x)A.i.  The  counterpart  of 
equation  4  for  continuous  .V  is 

/v  (  V 0  +  v  )(lx=    J  "  fx  (  v )  </.x  j  •  ./.V  (  v )  </.v  (5) 

It  is  easily  seen  that  the  folli>wing  experimental 
function  satisfies  equation  5: 
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/.v(x)  =  A-,e-'^'^/(o,.,U) 


(6) 


where  /  is  the  indicator  function.  Note  that  the  mean 
step  length  in  this  case  is  equal  to: 


■E{X]=\'^ 
Jo 


x-kie'"'^  dx  =  -]-,  (7) 


and  its  variance  is  equal  to 


dx 


From  equations  2  and  7, 


or 


AiAx 
l  +  kiAx 


—  =  -  ax=  Ax . 

ki    p  p 


kiAx[l-kiAx+ikiAx)^ 


-  + 


.]=kiAx  (9) 


if  higher  order  terms  are  ignored. 

The  physical  meaning  of  this  result  will  be  dis- 
cussed in  the  summary  at  the  end  of  this  section. 
A  concluding  remark  to  be  made  here  is  that  the 
step  length  is  found  to  be  exponentially  distributed 
according  to  equation  6  with  an  unknown  mean  of 
from  equation  7. 

Similarly,  the  distribution  of  rest  period,  T.  has 
the  following  p.d.f.  that: 


/r(0  =  A-2e-'^^'/(o,.)(0. 


(10) 


Sayre-HubbeWs  One-Dimensional  Model.  — Sayre 
and  Hubbell  (17)  used  the  standard  derivation  of  a 
homogeneous  Poisson  process  to  derive  the  step 
length  and  the  rest  period  distributions.  These 
derivations  can  be  found  in  Bailey  (i,  pp.  67-69) 
and  other  stochastic  process  tests. 


It  is  assumed  that  (1)  the  probability  of  a  particle 
making  a  step  at  a  location  (that  is,  to  stay  there 
and  then  to  make  a  step.  If  a  particle  does  not  make 
a  step  at  a  given  location,  it  means  that  the  particle 
passes  through  there  without  making  a  rest)  in 
any  interval  (x,  x  +  Ax],  ;c>0,  is  everywhere  the 
same;  (2)  the  probability  of  a  particle  making  a 
step  in  any  interval  {x,  x  +  Ax] ,  x  >  0  is  independent 
of  its  previous  history,  is  proportional  to  Ax,  and 
is  assumed  to  be  equal  to  kiAx;  and  (3)  the  prob- 
ability for  more  than  one  step  to  occur  in  any  inter- 
val {x,  x  +  Ax]  is  zero  (or  of  the  order  of  0{Ax)). 

Let  Piix)  be  the  probability  of  a  particle  making 
exact  i  steps  in  the  interval  (0,  x],  and  the  prob- 
ability of  making  no  step  in  (0,  x  +  Ax]  is  equal  to 
the  product  of  the  probability  of  making  no  step  in 
(0,  x]  and  the  probability  of  making  no  step  in 
{x,  x  +  Ax],  thus, 


po{x  +  Ax)  =po{x)  [1  —  kiAx]. 
This  implies 


(11) 


lim 

Ai->0 


po{x  +  Ax)  —pojx) 
Ax 


^-kipoix),  (12)1 


or 


dpojx) 
dx 


=  —  kiPo{x). 


(13) 


It  is  assumed  that  the  first  jump  occurs  atx>0  and 
hence, 


Po(0)  =  l. 


(14) 


With  the  above  boundary  condition,  equation  13 
yields  the  solution  that 


Poix)^e 


(15) 


This  is  the  probabihty  of  making  zero  jumps  in 
(0,  x].  The  distribution  function  of  X,  the  step 
length,  is 

Fx{x)=P  [step  \eng\hX^x]==l-P[X>x] 

=  l—P  {no  jump  occurs  in  (0,  jc]}  =  1  —po{x). 
The  density  function  oiX  is  then 
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fx{x) 


dFxix)_  dPoix) 


dx 


dx 


or 


fx{x)^k,e'X^^ho,^){x), 


(16) 


(17) 


which  is  the  same  as  Einstein's  equation  6.  Similarly, 
the  density  function  of  T,  the  rest  period,  is  then 

/r(0  =  A:2e-^-'/(n,.,(0.  (18) 

which  is  the  same  as  Einstein's  equation  10. 

Shen-Todorovic's  General  One-Dimensional  Model 
(CO.M.).  — assuming  that  the  processes  in  dis- 
tance and  in  time  are  independent  nonhomo- 
geneous  processes,  that  is,  using  ki{x)  and  k>{t) 
to  replace  Sayre  and  Hubbell's  ki  and  ki,  Shen 
and  Todorovic  (21 )  obtained  the  following  counter- 
part of  equation  13: 


dpojx) 
dx 


—  kt{x)po{x). 


(19) 


With  the  same  boundary  condition  given  in  equa- 
tion 14,  the  solution  becomes 


Po{x) 


(20) 


Following  the  similar  arguments  given  by  Sayre 
and  Hubbell  (17),  that  is,  using  equation  16,  we 
have 


fAx)  =  -  =  k  {x)e-f  -\x).  (21) 


Similarly,  we  have 


/7(0  =  ^2(Oe-/>-"""^<"--'(0. 


(22) 


Equations  21  and  22  form  a  general  model  which 
will  be  denoted  as  Shen-Todorovic's  G.O.M. 

Summary  Comments  of  the  Models  Presented  in 
this  Section.  —  The  models  presented  with  different 
derivations  by  Einstein  (4)  and  Sayre  and  Hubbell 
{17)  reach  the  same  result.  (17)  namely,  both  dis- 
tributions of  the  step  length  and  the  rest  period  are 
exponential.  The  probability,  p,  of  a  particle  making 
a  jump  at  the  end  of  any  discrete  point  x—i\x 


(used  by  Einstein)  is  essentially  the  same  as  the 
probability  kiAx  used  by  Sayre  and  Hubbell,  if 
the  higher  order  terms  of  the  latter  are  ignored. 
This  has  been  clearly  shown  in  equation  9.  Besides, 
the  assumptions  used  in  deriving  b(jth  models  are 
the  same.  We  then  conclude  that  these  two  models 
are  simply  different  versions  of  the  same  model, 
and  will  be  denoted  as  the  exponential-exponential 
model,  or,  the  E.E.M. 

In  a  uniform,  steady,  equivalent  flow,  the  distri- 
butions of  the  step  length  and  the  rest  period  should 
not  change  with  distance,  or  time,  or  both.  This  does 
not  mean  that  the  probability  of  a  given  particle 
making  a  jump  at  any  position  or  at  any  time  is 
constant.  Hence,  the  assumption  of  the  homogeneity 
of  the  processes  for  a  given  particle  making  a  jump 
at  any  position  and  at  any  time  is  questionable. 
Recent  experimental  evidence  (see  Hung  and  Shen 
(77))  appears  to  disagree  with  the  E.E.M.  since  both 
the  step  length  and  the  rest  period  are  not  expon- 
entially distributed. 

Shen  and  Todorovic  (27)  derived  their  promising 
G.O.M.  by  releasing  the  assumption  of  the  homo- 
geneity of  the  processes  both  in  distance  and  in  time. 
By  varying  the  functions  k\{x)  and  A"2(0<  the  step 
length  distribution  might  be  homogeneous  or  non- 
homogeneous  along  the  distance,  and  the  rest 
period  distribution  might  be  homogeneous  or  non- 
homogeneous  in  time.  Since  it  is  very  hard  for  one 
to  imagine  a  case  in  which  the  step  length  distribu- 
tion varies  with  the  distance  but  not  with  the  time 
while  the  rest  period  distribution  varies  with  the 
time  but  not  with  the  distance,  the  importance  of 
this  model  lies  not  in  its  validity  in  nonuniform,  or 
unsteady  cases,  or  both,  but  in  the  steady  uniform 
case. 

The  relation  between  the  step  length  distribution 
and  the  function  Ai(.v)  for  the  uniform  steady  case 
is  developed  here.  Suppose  the  density  function  of 
the  step  length  is  given  by  f\(.x)  with  its  /j-fold 
convolution  f^'\.\).  then  the  probability  of  a  given 
particle,  which  was  initially  put  at  distance  .v  =  0. 
depositing  at  the  interval  (  v.  v-t- A-v]  at  the  end  of 
nlh  jump.  y„(  v) A.V.  is 


Mx)  Xx  -    [/\"-"(^)/v(.v  =  ^)./t 

fi  =  2,  3.  4. 


.1\ 


i23) 
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with 

f,{x)=^fx(x)  (24) 

as  seen  from  figure  1.  Since  Ai{x)Ax  is  the  proba- 
bility that  the  given  particle  makes  a  jump  at  the 
particular  interval  {x,  x  +  Ajf] ,  it  must  be  equal  to 
the  probability  that  the  particle  will  eventually 
deposit  in  that  interval,  or 

ky{x)  AX  =  |;  fnix)  Ax., 
n  =  1 

or 

ki{x)^f^Mx)  (25) 

n  =  1 

If  the  step  length  is  assumed  to  be  exponentially 
distributed  with  the  density  given  by  equation  17, 
then 

fn{x)  =  k,e-^^^  '-—  (26) 

(n-1)! 

and 

k,{x)  =  I;  Mx)  =  k,e-X'-  f  A-i.  (27) 

This  is  the  assumption  used  by  Einstein  and  Sayre 


(n-i) 
fx  (x) 


/  ^ 

/ 

1 

 ^ 

L  <  . 

X 

Figure  L  — Sketch  for  deriving  the  function  ki{x)  based  on 
any /a- (a:). 


and  Hubbell  to  derive  their  E.E.M.  Similarly,  if 
the  step  length  is  assumed  to  be  gamma  distributed 
with  the  density  given  in  equation  56,  ki{x)  can 
be  found  to  be: 

k,ix)  =  k,e->^'-  y  .  (28) 

11=1  ^ 

It  is  interesting  to  note  that  equation  56  cannot  be 
obtained  if  one  introduces  equation  28  into  equation 
21.  The  possible  reason  is  that  Shen  and  Todorovic 
ignored  the  higher  order  terms  in  their  derivation. 

A  i  (x)  Ajc  is  the  summation  of  an  infinite  number  of 
probabilities,  /„(x)Ajr,  and  its  physical  meaning  is 
quite  clear.  Unfortunately,  the  exact  functional 
form  of  ki{x)^  is  rather  difficult  to  obtain  from 
mere  physical  reasoning,  and  it  is  even  hard  to  say 
whether  ki{x)  should  be  zero  or  not  at  x  —  0.  How- 
ever, A"i(x)  is  expected  to  approach  a  constant 
value  as  x  =0  in  a  uniform  steady  flow.  Similar 
arguments  discussed  here  can  be  applied  to  the 
distribution  of  the  rest  period. 

Theoretical  Derivations  of  the  Prob- 
ability Density  Function  of  the 
Distance  at  a  Given  Time,  ft(x) 

If  a  particle  is  released  at  x  —  0  when  t  —  0, 
f,{x)  A.X  is  the  probability  that  the  particle  wiU 
remain  at  the  interval  {x,  x  +  Ax)  at  time  t.  It  also 
represents  the  fraction  of  the  particles  which  are 
expected  to  remain  at  that  interval  at  the  given 
time  Mf  a  group  of  particles  of  the  same  kind  are 
released.  The  function  fiix)  is  of  great  importance 
since  (1)  it  represents  the  dispersion  of  the  bed- 
load  particles  in  the  alluvial  channel,  and  (2)  it 
can  be  used  to  check  the  validity  of  the  proposed 
model  and  to  estimate  the  required  parameters  by 
carrying  out  the  experiment  of  the  group  movement 
of  tracer  particles.  Theoretical  derivations  of 
fiix)  for  different  models  will  be  discussed  in  the 
following  paragraphs. 

Einstein  s  E.E.M.  — Einstein  (4)  considered  the 
case  where  all  particles  were  released  into  the 
stream  at  x  =  0  when  ^  =  0  so  that  all  particles  will 
make  their  first  step  immediately.  The  path  line  of  i 
a  specific  particle  is  shown  in  figure  2. 
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.  dX3 

p: 

dX2 


02 


t»  =0 


dt2  dts 


I  dtn^l 


Figure  2. -Sketch  used  to  find/,(x)  by  Einstein  (1937). 


The  small  circles  with  numbers  attached  to  them 
indicate  the  end  of  double  phases  (a  double  phase 
contains  a  step  length  and  a  rest  period).  Since 


the  step  length  and  the  rest  period  are  independently 
and  identically  distributed  (or  i.i.d.)  with  exponential 
distributions  (as  obtained  in  the  previous  section), 
the  probability  of  a  particle  following  the  specific 
path  and  reaching  (x,  x  +  dx)  at  time  t  (that  is,  fall 
into  the  shaded  area  ABCD  in  fig.  2)  after  complet- 
ing exactly  n  +  1  double  phases  is 


Yl  fxixi  —  Xi-i)dxi 


i=l 


e~''i-^n  + 1 


1  +  1 

W  friti  -  ti-i)dti 
i  =  1 

■^'n-l(Ylk,dxMYik2dti). 
i=l  1=1  ' 


The  probability  that  the  particle  first  falls  into 
the  shaded  area  after  completing  exactly  n  + 1 
double  phases  with  all  possible  paths  is  then  equal 
to: 


r  r-T/i+i         r^n  r^j 

Pn+i=  \fcie-''i^dx  I        kidxn  kidx„-i  ....   I  kidxi 

[         kze-i'^'n+idtn+i  [      k2dtn  [  k-idt,,. 

-  Jtn+l=T  ■''n=0  •''1.-1=0 


which  yields 

Pn+\  =  ki€        "i'  ;  ; — dx. 

nl  nl 

Let  f,{x)dx  be  the  probability  that  the  particle  falls 
in  the  shaded  area  with  all  possible  number  of 
steps,  then: 


\         k-z  IJL\ 

fjLi^       x  ■  fi{x)dx  =  -—  +  —  t  =^  IJix   '-  t 

Jo  ki        ki  fJLT 

^{l  +  N,)ixx,  (30) 

where  fXx  and  pr  are  the  mean  step  length  and  the 
mean  rest  period,  and  N,  —  iI/jlt  is  the  average  jump 
number  occurring  in  (0,  t].  Equation  30  implies 
that  the  centroid  of  the  area  under  the  curve  /f(.v) 
moves  with  the  constant  rate.  T  . 


f,{x)dx=  ^  />„+! 


n  1  n\ 


or 


/,(jf)  -  AiC  '^.•^         X   ;  ;  

n'  n' 
11=0     "  •  "  • 


(29) 


The  area  under  the  curve  fi{x)  versus  v  is  equal 
to  unity  since  ail  particles  are  forced  to  make  their 
first  step  at  Jt  =  0  when  t  —  0  with  probability  one. 

fJLt,  the  mean  travel  distance  at  time  is 


Mr  Ai 


(31) 


from  its  initial  position  which  is  one  mean  step 
length  from  the  origin  because  of  the  forced  move- 
ment aXt  =  0. 

cr^,  the  variance  of  the  travel  distance  at  time 
t,  is 


cr 


J  0 


(V  -  )Ur)  './r(.v)(/.V  =  -  +  —  r 

k- 


(32) 
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where  o-|  is  the  variance  of  the  step  length.  The 
above  relationship  indicates  that  the  particles  dis- 
perse with  the  hnear  rate  2ktlk\  in  addition  to  the 
initial  spread  l/Af.  Since  in  an  average  there  occur 
Nt  double  phases,  and  since  each  single  distance- 
step  phase  contributes  one  erf  to  erf,  the  dispersion 
contributed  by  each  single  time-step  phase  is  also 
equal  to  one  cr^.  This  is  a  rather  interesting  phe- 
nomenon. 

Einstein  also  derived  the  ft{x)  in  cases  where  all 
particles  start  with  the  rest  period  aX  x  —  0  when 
f  =  0  by  a  similar  procedure.  The  result  is  the  same 
as  the  one  obtained  by  Hubbell  and  Sayre  {10), 
which  wiU  be  discussed  in  the  following  paragraphs. 

The  E.E.M.  Derived  by  Hubbell-Sayre  and  Sayre- 
Hubbell.-{1965) 

As  presented  earHer,  Sayre  and  Hubbell  (10,  17) 
obtained  their  E.E.M. ,  that  is,  both  the  step  length 
and  the  rest  period  are  i.i.d.  with  exponential 
densities.  The  travel  distance  at  the  end  of  the  nth 
step  is  gamma  distributed  with  the  following  p.d.f.: 


Fin) 


(Aix)«-ie-'^'-^/(o,oo)U),  n  ^  1.  (33) 


Using  the  same  assumptions,  one  may  derive 
the  formula  for  the  probabiUty  that  exact  i  jumps 
occurred  in  (0,  t]  ,Pi{t)  or Pt{i)  as  follows: 

Pi{t  +  ^t)  =  piit)  [l-kiAt] 

+  Pi-i{t)  k2^t.  (34) 

This  leads  to  the  following  differential  equation: 

dpiit) 


dt 


(35) 


With  the  solution  for  Po(0  as  given  in  equation  15, 
equation  35  yields  the  solution 


Pi{t)  or p,{i)= 


i  =  0,  1,2, 


(36) 


It  is  the  probability  mass  function  (p.m.f.)  of  the 
Poisson  distribution  with  parameter  k-zt. 

Now  let's  consider  the  p.d.f.  of  the  travel  distance 
at  time  t  for  the  process  beginning  with  a  rest  phase 


at  ^  =  0  when  t  =  0,  F,{x).  IfN,  denotes  the  number 
of  jumps  occurring  in  (0,  t] ,  then 


Ftix)=P[X  ^x,T^t] 


=  P 


U  X  ^  x,T=  t,N,  =  n 


Since  all  elements  after  the  union  operator  are 
disjoint,  one  then  has: 

Ft{x)=^P[X  ^  x,T=t,N,=  n] 

n=0 

=  ^P[X  ^  x,N,^n)  U  iT^t,N,^n)]. 

n=0 

The  process  is  independent  in  both  t  and  x,  and  thus, 
Ftix)=f^P[X^x,N,  =  n] 

n=0 

■  P[T=t,N,  =  n]  (37) 


=  i  [  ly^'Hs)ds\  -PM 

n=l  L  Jo  -I 


+  P[X^  x,Nt  =  Q]  ■  ptiO)  (38) 

where  fx^'^^is)  and  Pt{n)  for  n  ^  I  are  given  in 
equations  33  and  36  and  P[X  ^  x,  N,  =  0]=l, 
pt{0)  =  e-"". 

The  following  equation  indicates  that  a  probability 
mass  e"*^^'  is  centered  at  x  —  0. 


Ftix) 


^  r  f-f 

=  S    J  fx 


Hs)ds 


Pt{n)  +  e-'''',x^  0|. 


(39) 


For  the  particles  which  have  already  moved  out  of 
the  origin,  the  distribution  along  x  at  time  t  is 


f>ix] 


dx 


±p,in)  /> 

,  •'0 


Hs)ds 


,x>0. 


which  yields  the  solution: 
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or 


(AiJt)"-'  (kit)" 
(n-D!  /;! 


(40) 


(41) 


The  area  under  the  curve  ft{x)  is  equal  to  1 — 
PiiO)  =  1  —  e~'''',  which  is  less  than  unity  since  part 
of  the  particles  stay  at  the  origin  at  any  time  t. 
(Ml.  the  mean  travel  distance  at  time  t,  is 


A-2  fJ.X 


(42) 


Thus,  the  centroid  of  the  particles  moves  from  the 
origin  at  the  same  rate  as  given  in  equation  31. 
cr^,  the  variance  of  the  travel  distance  at  time  is 


'     A?  ^ 


(43) 


Again,  the  dispersion  contributed  by  the  single 
time  step  is  equivalent  to  that  by  the  single  distance 
step.  Since  the  particles  start  with  a  rest  phase, 
the  original  centroid  and  variance  are  zero.  This  is 
why  equations  42  and  43  are  different  from  equations 
30  and  32  by  one  /x.v  and  one  cr^,.  respectively. 

As  proved  by  Einstein  {4)  and  Shen  and 
Cheong  (79),  fdx)  in  both  equations  29  and  41 
approaches  normal  distribution  as  /  ^  ^. 

Crickmore- Lean's  Models.  — Crickmore  and 
Lean  (2)  started  with  a  discrete  model  by  assuming 
that  all  particles  are  allowed  to  make  a  jump  with 
probability  p  and  constant  step  length  L  at  the  dis- 
crete time  ti—iT  where  i  =  0,  1,  2,  3,  .  .  .,  and  T 
is  a  fixed  time  duration.  If  a  particle  was  released  at 
x  =  0  when  t  =  0,  the  probability  of  the  particle 
staying  at  ;t  =  jc,  =  i  •  L  at  time  t  —  t„  =  nTis,  given  by 

i  =  |!  =  0,  1,2  n 


=  0  otherwise. 


(44) 


This  is  a  binomial  distribution  with  parameter  " 


and  p.  The  mean  travel  distance  -when  t=  tn  =  nT  \% 
/xr„=|;(iX)(  ")p'(l-p)"-'  =  AipL,  (45) 


or 


p.,  -  pL  J,- 


(46) 


The  mean  velocity  of  the  particle  is  then  approxi- 
mately equal  to 


T/_  dp,_pL 
dt  T' 


And  the  variance  is 

a-,-  =np(l-p)L- 


or 


a^  =  p{l-pW-^ 


(47) 


(48) 


(48a) 


When  np  becomes  larger,  the  binomial  distribution 
in  equation  44  can  be  approximated  by  the  follow- 
ing normal  distribution: 


Mx)  = 


1 


V2 


TT(T- 


(49) 


where  /ti  and  (r'f  are  given  in  equations  46  and  48a. 
It  should  be  pointed  out  that  Crickmore  and  Lean 
neglected  to  put  L-  into  the  expression  given  for 
(T'f  in  equation  48a. 

In  terms  of  the  step  length  S  and  the  rest  period 
R,  this  model  uses  the  following  assumptions: 


(50) 


P[R  =  ti^  iT]  =  lim        p'(  1  - p) 


=  0.1.2,.  . 


(51) 


where  \  =  np  should  be  a  constant.  This  well- 
known  Poisson  distribution,  as  derived  from  the 
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binomial  distributions,  is  introduced  by  most 
texts  dealing  with  the  probability  theory  (such  as 
the  one  by  Parzen  (14,  p.  105)).  This  will  lead  to 
the  exponential  distribution  for  the  rest  period  if 
one  allows  T  to  become  infinitesimaUy  small,  which 
is  the  case  employed  by  Einstein  (4)  and  Sayre 
and  Hubbell  (7  7)  in  their  E.E.M.  If  one  examines 
equations  36  and  51,  one  will  readily  find  that 
A.  =  k-zt  and  k-z  =  pIT.  This  derivation  is  quite 
similar  to  the  one  given  by  Einstein  but  not  as  good 
since  they  did  not  go  further  and  obtain  the  con- 
tinuous case. 

Another  model  was  proposed  by  Crickmore  and 
Lean  to  take  care  of  the  randomness  of  the  step 
length.  They  assumed  that  the  particle  is  allowed 
to  make  a  jump  at  the  discrete  time  ti  =  iT,  i  —  0, 
1,  2,  3,  .  .  .,  with  a  step  length  S,  which  is  governed 
by  the  following  probability  law: 

P[S  =  iL]^Ci,       1  =  0,1,2,3,... 

Unfortunately,  they  are  not  able  to  find  a  general 
expression  for  the  function  ft(x).  It  should  be 
pointed  out  here  that  if  c;  is  further  assumed  to  be 


Ci- 


lim 


P'(l-p)"- 


and  L  is  allowed  to  be  infinitesimaUy  small,  the 
distribution  of  the  step  length  will  become  ex- 
ponential as  obtained  by  Einstein  (4)  and  Sayre 
and  Hubbell  (17)  in  their  E.E.M. 

Crickmore  and  Lean  stated  in  their  paper  that 
the  frequency  (l/T)  of  the  particle  movement 
depends  on  the  depth  in  the  ripple,  and  proposed 
the  so-called  two-layer  model.  The  ripple  layer  is 
idealized  to  two  sublayers  of  the  same  average 
thickness  and  that  in  time  interval,  T,  different 
proportions  of  the  material  in  a  strip  of  unit  length 
move  through  a  constant  distance,  L,  from  the  two 
sublayers.  This  material  is  assumed  to  mix  and 
return  to  the  upper  and  lower  sublayers  in  the  same 
proportions  to  preserve  continuity  in  each  layer. 
The  larger  the  proportion  of  bed  material  in  a  certain 
sublayer,  the  larger  the  frequency  of  the  particle 
movement  if  one  keeps  the  time  interval  T  constant 
for  all  sublayers.  Although  they  could  not  derive 


(52) 


a  general  expression  for  ft{x),  they  had  already 
pointed  out  the  important  phenomenon  that  the 
mean  rest  period  should  increase  as  the  particle 
deposited  deeper  in  the  ripple  layer.  This  is  im- 
portant if  one  wants  to  find  the  relation  between 
ft(x)  and  the  arrangement  of  the  source  (either  a 
Une  source  on  the  surface  or  a  plane  source  uni-  j 
formly  distributed  across  the  cross  section  through 
which  the  particles  move).  They  further  studied  a 
special  case  of  a  three-layer  model  with  similar  | 
assumptions. 

Crickmore  and  Lean's  one  major  contribution 
was  to  point  out  that  the  rest-period  distribution 
should  be  a  function  of  the  elevation.  Sayre  and 
Conover  (16)  have  used  this  concept  to  develop 
their  general  two-dimensional  model  as  will  be 
discussed  in  the  next  paragraph. 

Sayre-Conover's  General  Two -Dimensional 
Model.  — Sayre  and  Conover  developed  a  general 
two-dimensional  model  wherein  the  rest  period 
distribution  is  considered  to  be  a  function  of  the 
position  of  the  particle  deposition  and  the  governing 
probabilistic  laws  for  the  step  length  and  the  rest 
period  remain  unspecified. 

The  following  five  assumptions  were  used  by 
them  to  develop  the  model:  !| 

1.  The  step  lengths  Xi  are  i.i.d.  with  f.xix)  which 
assumes  values  for  x  ^0. 

2.  The  step  heights  F,  are  i.i.d.  with  /y(y)  which 
assumes  values  for  y^in  ^  y  ^  Tmax-  J 

3.  The  rest  period  T,  is  independent  of  Tj  for 

4.  The  rest  period  T,  is  independent  of  Y{j—l) 

n 

for  i  7^  j ,  where  Y{n)  =  ^  Yj  is  the  elevation  of  the 

j  =  i 

particle  at  the  end  of  the  nth  jump. 

5.  Y{i)  is  independent  of  Y(  j  )  for  i     j . 

They  started  from  the  joint  distribution  function 
of  X  and  Y  for  a  given  time  t. 


F,ix,y)=2  P  \j^Xi^x,^Yi^y,Nt  =  n 

n=0       '-1=0  i=0 
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=c       r  n  Iff  r 

n=l  1=1  ■'''min       '- i=o 


N,=  n 


dy'  +  P[Xo^x]P[Yo^y,  I\,  =  0] 


=  i  rn''Hx')dx'\  r  fy{y')  I'fi^nn 

n=l  Jo  >-  Jj/^in  Jo 


fT\Y{T\y')dTdt'dy'     +P[Ya^y,  N,  =  0]. 


(53) 


The  reader  is  referred  to  the  original  paper  for 
intermediate  steps.  Since  P[yo=Sy,  Nt  —  Gi]  is  not 
a  function  of  x,  the  joint  density  function  of  X  and 
Y  at  time  t  is  the  partial  derivative  of  the  first 
term  in  the  right-hand  side  of  the  above  equation, 
namely, 

ft{x,y)=^f^-^{x)fy{y)  [V^"'(^') 

\     fT\r{r\y)dTdt'.  (54) 

Jt-V 

If  a  large  number  of  particles  are  released 
simultaneously  at  x  =  0,  y=Fo,  and  f=0,  the  above 
equation  expresses  the  longitudinal  and  vertical 
distribution  at  time  t  of  the  particles  which  have 
moved  from  their  respective  initial  positions. 
ft{x,  y)  is  not  p.d.f.  because  it  includes  only  the 
fraction  of  the  particles  which  have  moved  from 
their  initial  position. 

The  above  two-dimensional  model  can  be  reduced 
to  a  one-dimensional  case  by  integrating  equation 
54  over  y,  or 


f,{x) 


f  f  max 


y)dy 


=  '^f^'Hx)P[N{t)  =  n].  (55) 

n=  1 

This  is  called  Sayre-Conover's  general  one- 
dimensional  model.  The  E.E.M.'s  derived  by 
Einstein  (4),  Hubbell  and  Sayre  (10),  and  Sayre  and 
Hubbell  (17)  are  special  cases  of  this  above  model 


by  assuming  that  both  the  step  length  and  rest 
period  are  exponentially  distributed. 

Sayre  and  Conover's  two-dimensional  model  is 
very  general,  but  the  assumption  that  Y(i)  is 
independent  of  1^0)  for  j  j  conflicts  with  the  as- 
sumption that  Yi  is  i.i.d.,  since  Y(i)  =  Y{i  —  l)  + 
Yi,  Y(i)  cannot  be  independent  of  Y(i  —  l).  This 
two-dimensional  model  is  therefore  not  reliable  and 
should  be  rederived  without  the  fifth  assumption. 
However,  the  one-dimensional  model,  as  the  mar- 
ginal distribution  of  the  unsound  two-dimensional 
one,  has  been  directly  proven  by  Yang  and  Sayre 
(1971)  without  going  through  the  two-dimensional 
matter. 

Yang  and  Sayre's  G.E.M.  — Yang  and  Sayre  (22) 
found  from  their  preliminary  experiments  (by 
observing  the  movements  of  single  particles  in 
flumes)  that  the  rest  period  is  exponentially  dis- 
tributed with  the  p.d.f.  given  in  equation  10  and  the 
step  length  is  gamma  distributed  with  the  following 
p.d.f.: 


/v(;c)=|^(^,x)-'e-^-/,o,x,(x). 


(56) 


They  then  substituted 

A-. 


1  (nr) 


(A,x)"^-'e-^'--/(o,x)(A:)  (56a) 


and 


P[N,^n]^-^e 


(57) 


into  equation  55  and  obtained 


/,U)  =  A,e--".|(i^lM:.  ,58. 


This  model  will  be  called  gamma-exponential 
model,  or,  in  short,  G.E.M.  The  area  under  the  curve 
fi(x)  indicating  the  fraction  of  the  particles  which 
have  left  the  origin  at  time  t  is  equal  to  1  —  P[.V,= 
0]  =  l-e 

Mf.  the  mean  travel  distance  at  time  t  is: 


Mf  =  7-A-.r  =  ^f  =  .V,Mv. 


(59) 
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<jf,  the  variance  of  the  travel  distance  at  time  t  is: 
aj^{r+\)—^k2t^ir+l)N,(Tj^.  (60) 


The  distribution  function  and  the  density  function 
of  the  travel  distance,  x,  at  time,  t,  for  a  case  where 
the  particle  starts  with  the  rest  phase  at  x  =  Xo 
and  t=  to  are 


The  above  relationship  indicates  that,  on  the  average, 
each  distance  step  contributes  one  cr^  to  the  spread- 
ing and  each  time  step  contributes  rcr^  to  the 
spreading. 

Since  exponential  distribution  is  only  a  special 
case  of  the  gamma  distribution  with  r=  1,  the  E.E.M. 
developed  by  Einstein  (4)  and  Sayre  and  Hubbell 
{17)  are  special  cases  of  this  model.  Some  properties 
of  this  fiix)  were  investigated  by  Shen  and  Cheong 
(19)  who  found  that  fiix)  will  also  approach  the 
normal  distribution  for  large  t. 

Shen-Todorovic  s  General  One-Dimensional 
Model.  — Shen  and  Todorovic  (27)  derived  another 
version  of  equation  37,  and  then  used  the  following 
basic  assumptions  to  derive  their  general  one- 
dimensional  model  by  a  nonhomogeneous  compound 
Poisson  process: 

1.  The  sediment  particle  moves  in  a  series  of 
alternate  rests  and  steps. 

2.  The  particle  always  moves  in  the  downstream 
direction. 

3.  The  process  in  X  is  independent  from  the 
process  in  T  (this  important  assumption  was  used 
by  them  although  not  specifically  mentioned). 

4.  The  probability  of  making  two  or  more  steps 
in  the  time  period  {t,  t  +  ^t]  as  ^  0  should  be  of 
a  higher  order  than  At  and  be  neglected. 

5.  The  information  that  a  particle  makes  exactly 
a  particular  number  of  steps  in  (to,  t]  has  no 
influence  on  the  probability  of  the  particle  making 
a  step  in  the  next  time  period  {t,  t  +  A.t]. 

6.  The  time  duration  of  a  distance  step  is 
insignificant  as  compared  with  the  rest  period. 

7.  The  total  distance  X(n)  traveled  by  the 
sediment  particle  after  n  steps  should  not  depend 
on  which  time  intervals  (to,  t]  that  those  n  steps 
occurred. 

As  indicated  earlier,  the  essential  difference 
between  Shen-Todorovic's  model  and  Sayre  and 
Hubbell's  model  is  that  the  former  model  let  ki 
be  a  function  of  x  and  ^2  be  a  function  of  t  while 
these  two  values  are  kept  constant  in  a  particular 
steady  uniform  flow  by  Sayre  and  Hubbell  in  their 
model. 


-  r  k,{x')dx'  -  f  kAf)dt'      OC  oc 

F,{x)  =  e  X  l^(^^J) 

n  =  0  j  =  H 


with 

AinJ) 
and 


r  k,{x')dx'    '     !  'k-zit')dt' 

Jj"o  J    L  J  to 


nl 


(61) 


ftix)  =  kr{x)e  2  fi(n) 


with 


fi(n)  = 


fx  -]n-i  r  ft 

kiix')dx'  ko{t')dt' 

Jxo  J         L  Jia 


(n-1)! 


X  <  Xo. 


(61a) 


If  one  puts  ki{x)  =  k\,  k-zit)  =  k-z^  Xn  =  0,  and 
to  =  0,  equation  61a  becomes  equation  41,  thus, 
Sayre-Hubbell's  model  is  a  special  case  of  this 
model. 

Similar  results  can  be  derived  for  a  case  where 
the  particle  starts  with  the  movement  phase  at 
X  =  Xo  and  t  —  to.  Einstein's  equation  29  can  then 
be  obtained  as  a  special  case  of  it. 

Summary. —  Crickmore-Lean's  first  model  (2) 
using  constant  step  length  and  a  discrete  random 
rest  period,  is,  perhaps,  the  least  accurate  of  aU  the 
models  previously  discussed.  The  normal  distribu- 
tion is  valid  only  for  large  time.  They  treated  both 
step  length  and  rest  period  as  discrete  random 
variables  in  their  second  model,  but,  unfortunately, 
found  no  general  solution.  Their  models  can  be  made 
similar  to  the  models  by  Einstein  and  Sayre-Hubbell 
with  additional  assumptions  (such  as  allowing  small 
steps  L  and  T  to  be  infinitesimally  small).  Though 
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with  no  general  solutions,  their  two-layer  and  three- 
layer  models  provide  useful  thoughts  for  one  to 
construct  a  two-dimensional  model  with  the  condi- 
tional rest  period  distribution  known. 

The  E.E.M.'s  derived  by  Einstein  (4)  and  Sayre 
and  Hubbell  (17)  are  essentially  the  same,  but  they 
used  different  methods.  Both  distributions  of  the 
step  length  and  the  rest  period  are  exponential. 
This  model,  although  proven  to  be  invalid  by  Hung 
and  Shen  (11),  is  theoretically  sound. 

Sayre-Conover's  two-dimensional  model  (16)  is 
quite  general.  fi(x.  y)  is  completely  determined  by 
the  following  three  p.d.f.'s:  (1)  the  step  length, 
(2)  the  rest  period  given  the  elevation  at  which  a 
particle  is  deposited,  and  (3)  the  elevation  at  which 
the  particle  is  deposited.  Unfortunately,  they  used 
two  mutually  exclusive  assumptions  in  their  deriva- 
tions, namely,  (1)  each  step  height  K,  is  i.i.d.,  and 
(2)  the  elevation  F(i)  is  independent  of  Y(j)  for 
i  =t=  j.  This  model  is  hence  questionable.  Never- 
theless, their  general  one-dimensional  model  can  be 
derived  from  their  two-dimensional  one.  fi(x)  is 
determined  by  the  step  length  distribution  and  the 
probability  that  exact  n  jumps  occur  in  (0.  t],  for 
n  =  0,  1.  2,  .  .  ..  In  this  model,  the  step  length 
must  be  i.i.d.,  and  the  rest  period  is  not  necessarily 
i.i.d. 

Yang-Sayre's  G.E.M.  is  a  special  case  of  the 
above  one  by  assuming  that  the  step  length  and  the 
rest  period  are  gamma  and  exponentially  distributed. 
The  E.E.M.  is  a  special  case  of  this  model  by 
assuming  the  shape  parameter  in  the  step  length 
distribution  r=  1 . 

Shen-Todorovic's  model  (27)  considered  the 
nonhomogeneous  Poisson  process.  This  model  is 
valid  whether  or  not  the  step  length  and  the  rest 
period  are  i.i.d.,  provided  the  processes  in  distance 
and  in  time  are  independent  of  each  other.  Since 
k\{x)  and  k-zit)  are  required  in  this  model  and 
these  two  functions  are  not  easy  to  determine  from 
the  experiment,  Sayre-Conover's  general  one- 
dimensional  model  is,  perhaps,  easier  to  treat. 

Figure  3  schematically  shows  the  interrelationship 
among  various  models.  The  models  by  Shen- 
Todorovic  (21)  and  Sayre-Conover  (/6)  are  perhaps 
the  most  general  and  are  physically  related  by  the 
description  given  in  the  table.  Yang-Sayre's 
model  (22)  is  seen  to  be  a  special  case  of  Sayre- 
Conover's  model. 


The  function  ^i(j:),  in  respect  to  the  gamma 
density,  is  given  in  equation  28.  However,  it  is  known 
that  Yang-Sayre's  model  is  not  exactly  a  special 
case  of  Shen-Todorovic's  model.  As  expected,  the 
models  by  Einstein  (4)  and  Sayre-Hubbell  (17)  are 
special  cases  of  the  previous  three  models.  Finally, 
as  stated  previously,  Crickmore-Lean's  model  (2) 
is  perhaps  the  least  accurate  model  which  treats 
the  step  length  as  a  discrete  constant  and  the  rest 
period  as  a  discrete  random  variable.  Crickmore- 
Lean's  model  may  approach  the  homogeneous 
Poisson  process  with  certain  modifications. 

Calculation    of    Bedload  Transport 
From  Stochastic  Models 

The  two  major  goals  of  stochastic  models  are  (1) 
to  determine  the  dispersion  of  bedload  particles, 
and  (2)  to  find  bedload  transport  rates.  The  first 
item  is  characterized  by  fi(x).  which  was  discussed 
earlier  and  was  further  investigated  by  Shen  and 
Cheong  (19),  the  second  goal  is  discussed  as 
follows. 

Hubbell-Sayre's  Total  Bedload  Formula  (10).— 
Hubbell-Sayre  developed  a  total  bedload  formula 
in  case  any  of  the  stochastic  models  given  in  the 
previous  section  were  accepted.  Consider  a  channel 
bed  width  B.  with  uniform  bed  material  (specific 
weight  js  and  porosity  A.)  and  assume  that  all 
particles  on  the  bed  layer  of  depth  d  are  moved 
with  an  average  velocity  V,  the  bedload  discharge 
Qs.  in  weight  per  unit  time,  is 

Os  =  ys(\-K)BdV.  (62) 

Since  all  particles  are  moved  in  sequences  of  step 
lengths  and  rest  periods,  the  continuity  equation 
of  the  sediment  must  be  satisfied  in  a  statistical 
sense. 

If  a  bed  consists  of  composited  material,  the  total 
load  can  be  calculated  by 

^  (?.,.=  (i-x)B,/V/,,(yJ.r.  (63) 

i  =  I  i 

where  />,  is  the  proportion  of  that  particular  group 
of  bed  material. 

The  average  particle  velocity  /  is  given  by  the 
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ratio  of  the  mean  step  length  fix  and  mean  rest 
period  fxr,  thus, 


tlx 

fJLT 


(It 


(64) 


The  values  of  V  are  given  by  different  models  as: 
Einstein  (4) 
Hubbell-Sayre  {10) 
Yang  and  Sayre  (22) 
Crickmore-Lean  (2) 


v  = 

(31) 

r  = 

(59a) 

pLIT. 

(47) 

The  main  difficulty  in  using  equations  62  and  63 
is  the  estimation  of  d.  Hubbell  and  Sayre  {10) 
suggested  a  procedure  to  estimate  d  from  known 
bed  form  distributions,  however  it  is  rather  difficult 
to  apply. 

Einstein's  Bedload  Equation  (5,  6).  Einstein 
developed  his  bedload  equation  by  introducing  the 
effect  of  the  randomness  of  the  dynamic  force  on 
the  bedload  transport.  The  idea  is  that  the  prob- 
abihty  of  a  particle  being  eroded  at  a  certain 
position  is  equal  to  the  probability  that  the  dynamic 


'Theoretical^ 
Derivation 


F  (x)  =    I    P[X  <  X  ,  N    =  n]    •  P[T  =  t   ,  N    =  n]  (37) 
n=0 


Eqs.  (24) 
(25) , (24a) 
(25a) 


Sayre-Conover ' s  G.O.M. 
(1967)  (55) 


\ 

\ 

Shen-Todorovic 's 

!  1 

G.O.M. 

(1971)  (61a) 

|Yang-Sayre's  G.E.M.   (1971)   (58)[-  -  =   , 


k  (t)  =     [    f  (t)  (25a) 
t=l 

f  (t)  =  f  f„^""^^C)  f(t-C)dn 

Jo    ^  (  (24a) 

f^(t)  =  f^(t)  3 


Sayre-Hubbell's  E.E.M. 
(1965)  (41) 


Crickmore-Lean ' s  Model 
(1962)  (49) 


Figure  3.  — Summary  of  all  the  stochastic  models. 
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forces  exerted  on  the  particle  there  exceed  the 
particle's  submerged  weight. 

The  number  of  particles  passing  a  cross  section  of 
unit  width  per  unit  time  is 


(65) 


which  should  be  equal  to  the  number  of  particles 
eroded  per  unit  time  in  the  reach  of  length  fix  (the 
mean  step  length)  and  unit  width  per  unit  time,  or 


(66) 


where 

qb  =  transport  rate  in  dry  weight  per  unit 

time  per  unit  width 
js  =  specific  weight  of  the  particle 
D=  diameter  of  the  grain 
AiD'^  =  projected  area  of  the  grain 
A'zD^  —  volume  of  a  grain 

p  =  probability  for  a  particle  to  be  eroded 
ti  =  exchange  time. 

It  is  assumed  that  the  particle  moves  with  small 
steps  with  a  fixed  length  kD,  where  X  is  a  constant. 
Since  the  eroding  probability  is  [)  everywhere,  the 
step  length  is  distributed  with 


P[X=iKD]  =  {\-i))p'-\       t=l,  2,  3, 


and  has  the  mean 


(67) 


(68) 


Einstein  assumed  that  the  exchange  time  is  pro- 
portional to  the  time  necessary  for  the  particle  to 
fall  in  a  still  fluid  through  a  distance  equal  to  its 
own  diameter,  so  that 


tx  =  A-A-  =  Ai ' 
w 


1/2 


From  equations  65,  66,  68,  and  69.  we  have 

/1*<I> 


(69) 


P  = 


i+/r*4) 


where. 


and 


A   


A1A3 

Aok 


1 


1/2 


(70) 


The  dimensionless  number  <t>  was  called  the  in- 
tensity of  bedload  transport  by  Einstein.  Two 
rates  of  bedload  transport  are  dynamically  similar 
if  (t»  is  equal. 

The  probabiUty  p  given  in  equation  70  should  be 
equal  to  the  probabiUty  that  the  Hft  force  L  is 
larger  than  the  submerged  weight  Ws,  or 


p-P[c,.^-\u^A,D^>{ys 


y)A2D^]  (71) 


where,  ci.  is  the  lift  coefficient  and  L  is  the  instan- 
taneous velocity.  Einstein  further  assumed  that 
is  equal  to  a  mean  value       plus  a  normally  dis- 
tributed random  component  17.  which  has  its  mean 
zero  and  fixed  variance  17^,  and  found  that: 


1  r^'^'V 
=  1  %Jldt^- 


/here 


A^<P 
+  A,<t> 


ys-y  D 
y  RiS 


(72) 


(73) 


and  S  is  the  energy  slope  and  R't,  is  the  hydraulic 
radius  attributed  to  the  sediment  grain  of  the  bed. 
Constants  A^  and  were  found  to  be  4.35  and 
0.143,  respectively.  ^  is  called  the  flow  intensity 
and  can  be  used  to  discuss  the  similarity  of  the 
flow  as  far  as  the  bedload  transport  is  concerned. 

Einstein's  well  known  bedload  equation  is 
definitely  the  most  comprehensive  one  among  the 
existing  models.  Four  major  weaknesses  are  found: 
(1)  Too  many  untested  assumptions  are  made  and 
too  many  empirically  iletermined  constants  are 
used:  (2)  the  length  of  the  small  step  is  arbitrarily 
assumed  to  be  constant  for  a  certain  size  i>f  particle; 
(3)  A,\Dlw  is  used  as  a  cliaracteristic  time  scale 
without  any  solid  evidence:  and  (4)  the  randnmness 
ol  the  bed  profiles  are  not  accounted  tor. 

In  determining  A,  .  tlu-  number  of  particles  N\hich 
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eroded  per  unit  width  per  unit  time,  Einstein  con- 
sidered that  all  particles  move  with  fixed  step 
lengths,  /Xa-.  This  conflicts  with  the  reality  since 
the  particles  are  seen  to  move  with  random  step 
lengths.  McNown  (13)  reasoned  that  the  density 
function  of  the  step  length  should  be  zero  at  x  —  0 
and  approaches  zero  as  x^'^  and  selected  a  special 
case  of  the  family  of  skewed  frequency  curves  of 
Pearson  type  III: 


fx{x)=xe  -^/(o,  x)(jj) 


(74) 


to  determine  Ne.  This  is  actually  a  special  case  of 
the  gamma  density  with  ki=l  and  r—2,  which  has 
its  mean  /Xx—  rlki  =  2. 

As  shown  in  figure  4,  the  probability  that  X  >  s  is 


fAs)ds^l-Fxix)={l  +  x)e- 


and  the  total  number  of  particles  that  cross  the 
section  A— A  per  unit  width  per  unit  time  is 


A,D^ 


(75) 


where  Psti=p. 


Figure  4. —  Sketch  used  to  justify  Einstein's  use  of  constant  step 
lengths. 


The  number  2  in  the  above  equation  has  a 
dimension  of  length  and  is  equal  to  the  mean  step 
length,  fXx,  rather  than  ".  .  .  is  approximately  three 
fourths  of  the  mean"  as  stated  by  McNown.  This 
means  if  the  step  length  is  distributed  with  the 
density  given  by  equation  74,  Einstein's  using 
fixed  step  length  is  legal.  However,  it  can  easily 
be  shown  that  Einstein  is  always  right  if  the  density 
of  the  step  length,  fx{x),  assumes  values  for 
X  >  0,  since  the  following  relation  holds  and  it  is 
distribution  free: 


^     Ps  f' 

A^D'  Jo 


[l-Fx{x)]dx 


^  Ps 

A,D  Jo 


Xfxix)dx  =  IJLx 


Ps 

AiD 


(76) 


Experimental  Evidence  on  ft(x)  and 
Their  Implications 

Einstein  (4).  Sediment  U5ec^.  —  Uniform  gravels 
which  where  retained  between  two  square  meshed 
sieves  of  24  mm.  and  34  mm.  were  used.  Three 
classes  of  different  shapes  were  further  divided 
according  to  their  smallest  diameters,  dc,  of  the 
particles,  namely,  (1)  spherical,  24  mm.  <  dc  <  S4 
mm.;  (2)  average,  17  mm.  <  dc  <  24  mm.;  and  (3) 
flat,  0  mm.  <  dc<  17  mm. 

Experimental  procedure.  — Small  amounts  of 
colored  particles  were  released  at  x==^0  when  the 
water  with  sediment  was  flowing  at  equilibrium 
conditions.  The  flow  was  halted  after  a  desired 
period,  and  the  distribution  of  colored  particles 
was  determined. 

Conclusions  from  24  runs  of  uniform  gravel 
experiments.  — 

1.  The  first  two  moments  as  given  in  equations 
30  and  32  were  used  by  Einstein  to  estimate  the 
parameters  ki  and  ^2-  It  was  found  that  the  E.E.M. 
could  be  reasonably  well  fitted  by  his  experimental 
result. 

2.  The  fitted  step  length  is  independent  of  the 
flow  conditions  but  is  larger  for  particles  near  the 
spherical  shape.  It  decreases  from  5.01  mm.  to 
2.66  mm.  and  1.65  mm.  for  the  above-mentioned 
three  classes. 

3.  The  fitted  mean  rest  period  is  independent  of 
the  particle  shape  and  is  equal  to  110  9^^'^,  where 
Qs  is  the  specific  sediment  discharge  in  liters  per 
second  per  meter. 
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4.  The  fitted  particle  travel  velocity,  V,  which  is 
the  ratio  of  the  mean  step  length,  fix,  to  the  mean 
rest  period,  /ir,  is  proportional  to  q~J^  with  the 
proportional  constant  increased  from  flat  to  spheri- 
cal shapes.  Under  the  same  Qs,  the  spherical  parti- 
cles move  three  times  the  speed  of  the  flat  ones. 

5.  d,  the  average  thickness  of  the  bedload  layer 
is  equal  to  Qs  divided  by  V.  Using  the  mean  V  of 
the  three  classes,  that  is,  F=  0.0230  q'~'-^  m./sec, 
one  has 

d^^  =  43.5qllHmm.).  (77) 

Note  that  bedload  motion  occurred  without  dunes 
or  ripples  only  when  d=D  (the  particle  size).  Dunes 
were  formed  on  the  bed  when  d  >  D.  Not  all  parti- 
cles on  the  top  layer  move  at  the  same  time  when 
d<D. 

6.  Each  step  length  is  found  to  be  composed  of 
a  series  of  small  steps  slightly  less  than  the  average 
size. 

Conclusions  from  five  runs  of  mixture  experi- 
ments. — 

1.  The  fitted  mean  step  length  for  each  shape  of 
particles  behaves  the  same  in  the  mixture  as  in  the 
uniform  sediment. 

2.  The  fitted  mean  rest  period  is  proportional  to 
instead  of  9^. in  the  uniform  sediment. 

3.  The  fitted  V  is  proportional  to  q  instead  of 
q^^^  in  the  uniform  sediment.  The  reason  is  that  the 
particles  move  with  a  fixed  depth  in  the  mixture. 

Comments  by  the  writers.  —  1.  The  E.E.M.  should 
not  be  accepted  merely  because  it  fits  nicely  for  the 
//(x)-curve  of  a  particular  time.  It  is  believed  that 
this  curve  can  be  equally  well  fitted  by  some  other 
models  if  both  the  first  and  the  second  moments  are 
used.  If  several  //(A:)-curves  were  measured  at 
different  times  under  the  same  conditions  and  if 
the  fitted  ki  and  k-i  for  all  curves  could  be  considered 
to  be  fixed  numbers,  the  E.E.M.  can  then  be 
accepted. 

2.  No  values  of  ^t.v  and  /jli  could  be  obtained  by 
Einstein  for  some  of  his  experiments.  He  presented 
several  plausible  reasons  without  questioning  the 
correctness  of  his  model.  As  will  be  introduced 
later,  this  model  has  been  proven  to  be  incorrect 
by  Hung  and  Shen  (II). 

3.  Since  Einstein's  E.E.M.  is  incorrect,  the  fitted 
mean  step  length  and  mean  rest  p<M  iod  are  meaning- 


less. But,  the  conclusions  drawn  for  V  and  d  are 
still  valid  since  they  are  distribution  free. 

Hubbell  and  Sayre  (10)  and  Sayre  and  Hubbell 
(17).  —  Particles  tagged  with  radioactive  Iridium-192 
and  Antimony-122  were  used  as  the  tracers  in  both 
field  and  flume  studies.  The  tracers  were  released 
instantaneously  as  a  fine  source  in  a  two-dimen- 
sional flow.  A  series  of  the  concentration-distribu- 
tion curves  was  obtained  at  different  times  under  the 
same  flow  conditions.  The  concentration-distribu- 
tion function  <l>f(:x:)  is  related  to fi(x)  by 

^,(x)  =      ■  f ,(x)  (78) 

where  ^  is  the  total  weight  of  the  tracer  particles 
placed  in  the  channel,  B  is  the  width  of  the  channel, 
and  d  is  the  average  depth  beneath  the  bed  surface 
to  which  the  tracer  particles  are  distributed. 

Two  characteristics  of  the  concentration-distri- 
bution function  were  used  to  estimate  A:i  and  k>, 
namely,  (1)  the  speed  of  its  mode,  Xm,  that  is, 

and  (2)  the  time  rate  of  attenuation  of  the  peak 
relative  concentration,  per  foot  of  width;  relative 
concentration  is  defined  as  the  ratio  of  concentra- 
tion at  a  point  to  the  area  under  the  concentration- 
distribution  curve,  ^,  that  is, 

^1  (x)  max  ^  r  I  \ 

 ^ft(x)max-  (80) 

The  slope  of  the  curve  .r,„  versus  time  was  first 
used  to  find  A-i/Ai.  Two  log-log  plots,  which  represent 
the  relation  between  the  observed  *l*f (^)  max/-^  and 
time  and  the  theoretical  relation  between  ft(x)nr.\\lki 
and  k't,  were  used  to  supply  the  other  required 
relation  between  Ai  and  A2.  They  noted  that  the 
observed  curve  should  fall  on  the  theoretical  one 
after  a  proper  shift  without  rotation  if  they  are 
superimposed  one  over  the  other. 

They  found  that  the  ct>ncentration-distribution 
curves  derived  from  tiieir  E.E.M.  agree  quite  well 
with  the  predicted  ones. 

)  ano  Tsurhiya-Michiue  (2.'i.  — In  addition  to 
Einstein  and  Sayre-Hubbell,  Yano-Tsuchiva- 
Michiuc  also  verified  by  experinuMit   llic  K.K.M. 
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They  presented  the  variance  of  the  travel  distance 
for  a  given  time,  cr'j,  to  be  approximately  Ikitlki. 
Actually  it  can  be  proven  that  cr]  is  exactly  equal 
to  Ik-itlki. 

Their  experiment  was  very  similar  to  that  by 
Einstein  (4).  Two  separate  groups  of  sediment  with 
sizes  3.5  mm.  (specific  gravity  2.65)  and  6.75  mm. 
(specific  gravity  1.24)  were  used  and  a  group  of 
mixed  sand  with  sizes  of  1.20,  3.24,  and  7.03  mm. 
were  also  tested.  Their  results  seem  to  agree  with 
Einstein's  E.E.M.  quite  well.  The  probability  that  a 
particle  remained  at  the  origin  at  various  times  was 
found  experimentally  to  be  an  exponential  function 
as 


P[A^,  =  0]  =  toe-^", 


(81) 


where  to  was  not  unity  due  to  the  difficulty  of 
setting  the  initial  condition  correctly  (according  to 
the  investigators).  The  time  rates  of  the  movement 
of  the  centroid  and  the  spreading  of  the  particles, 
that  is,  equations  42  and  43,  were  used  to  calculate 
ki  and  k-i.  This  method  of  moments  was  also  used 
by  Yang  and  Sayre  (1971)  to  estimate  their 
parameters. 

Their  conclusions  for  both  uniform  sediment  sizes 
are  as  follows: 

1.  The  fitted  mean  step  length,  /u..v=l/A;i,  is 
found  to  be  a  multiple,  M ,  of  sediment  diameter. 
M  appears  to  be  a  constant  and  varies  between  80 
to  250  for  F  <  0.1  and  M  increases  with  F  for  larger 
F ,  where  F  is 


F  =  ^^^ 


(82) 


where  ^  is  Einstein's  flow  intensity  and  refers 
to  incipient  motion. 

2.  Let  p  be  the  probability  for  a  particle  to  be 
eroded  as  used  in  Einstein's  equation  66,  then. 


p  —  kzti  ~  ki 


D 


y 


s  ys-y. 


1  (D 


y 


p-T  \g  ys-y 


(83) 


where  ti  is  the  exchange  time  as  given  in  equation 
69.  It  is  found  experimentally  that  p  sharply  in- 
creases with  the  increase  of  F,  and  hence  the  fitted 
Pt  decreases  rapidly  with  F.  Since  the  step  length  is 
fairly  constant  for  F  <  0.1,  the  increase  in  bedload 


transport  associated  with  the  increase  in  F  is 
solely  because  the  particles  jump  more  frequently. 
3.  The  nondimensionalized  mean  travel  velocity 


V 

w 


y 


ys-y 


1/2 


(84) 


increases  with  the  increase  of  F  with  the  following 
relation: 


gD 


ys  -  y 


1/2  /II 

=  4.5  (  


(85) 


A  corresponding  bedload  formula  is  found  as 


Qr  /  1       1  \  ' 


(86) 


where  qb  is  the  specific  bedload  discharge  in 
m.'^/sec,  is  the  shear  velocity  in  m./sec,  and  d 
is  the  flow  depth. 

The  conclusions  obtained  from  their  experi- 
ments of  mixtures  are  as  follows: 

1.  Larger  particles  move  faster. 

2.  Vjw  seems  to  be  independent  of  the  flow 
intensity,  F. 

3.  The  step  length  is  shorter  in  mixtures  than  in 
uniform  bed. 

4.  p  is  larger  for  coarser  gravel,  and  is  much 
larger  in  the  mixture  than  in  the  uniform  bed.  The 
trend  of  p's  increasing  with  F  is  seen  to  have  a 
steeper  slope  in  mixture  than  in  uniform  bed.  This 
point  is  not  clearly  pointed  out  by  the  authors. 

5.  Due  to  the  acceleration  effect,  coarser  par- 
ticles have  larger  discharge  in  mixture  than  in 
uniform  bed.  On  the  other  hand,  finer  particles 
have  smaller  discharge  in  mixture  than  in  uniform 
bed  due  to  the  hiding  effect.  The  measured  hiding 
factor 


Theoretical  bedload  for  uniform  bed 
Measured  bedload  for  the  given  mixture 


(87) 


is  found  to  be  smaller  than  that  proposed  by 
Einstein  (6). 
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Yang-Sayre  (22).  — Yang  and  Sayre  carried  out 
five  runs  of  experiments  using  similar  techniques 
employed  by  Hubbell  and  Sayre  (10).  Several 
/f  (jc)-curves  were  measured  for  each  run.  Three 
equations  were  used  to  determine  r,  ki,  and  A2  in 
their  G.E.M.,  namely, 

da,  k-z 

'^  =  '''+'»|  =  *-  ,60a, 

and 

o^r^_       r  +  2       _r  +  2  _r+2  J_ 

(88) 

where  S  is  the  skewness  coefficient  and  is  defined 
as  the  ratio  of  the  third  center  moment  of  the  travel 
distance  to  the  variance  raised  to  the  3/2  power.  It 
should  be  noted  that  the  authors  were  wrong  to 
use  ^2  instead  of         in  equation  88. 

Their  experimental  data  indicates  that  fit  and 
(jf  generally  vary  linearly  with  time  as  predicted 
by  their  G.E.M.  Sometimes,  this  relationship  does 
not  hold  for  small  t  due  to  the  difficulty  of  releasing 
the  tracer  particles  instantaneously.  The  skewness 
parameter,  SV7,  instead  of  being  a  constant  as 
predicted  theoretically,  seems  to  change  continu- 
ously with  time.  No  satisfactory  asymptotic  be- 
havior, as  claimed  by  the  authors,  is  apparent  in 
their  graphs.  The  asymptotic  lines  given  by  them 
for  SvT  at  larger  /  are  quite  arbitrary.  Note  that 
SV7^.)  is  confined  between  1  and  2.  and  the  curve 
is  quite  flat  for  r  >  1.  Thus,  a  small  error  in  S  V7^:t 
would  result  in  vastly  different  r  values.  It  is  com- 
pletely not  reliable  to  use  the  skewness  in  estimat- 
ing r  and  hence  /u..v  and  jxr- 

The  error  on  SVT  due  to  the  inaccuracy  of  r  is 
small  enough  so  that  one  can  estimate  the  function 
ft{x)  without  causing  unbearable  error.  This  is 
especially  true  for  large  r  and  t.  To  this  purpose, 
one  may  arbitrarily  choose  r  and  then  determine 
k\  and  Aj  from  ecjuation  59a  and  60a.  It  should  be 
pointed  out  that  the  assumed  r  and  the  estimated 


A]  and  k-i  should  not  be  used  to  determine  the  step 
length  and  the  rest  period  distributions. 

The  behavior  of  the  function  fiix)  for  given  sets 
of  ^1  and  ^2  with  arbitrarily  chosen  r's  are  investi- 
gated by  Shen  and  Cheong  {19). 

In  order  to  avoid  the  use  of  the  skewness  param- 
eter, Hubbell  and  Sayre  (in  the  closure  of  the  dis- 
cussion of  their  paper)  suggested  estimating  h. 
from  the  rest  period  distribution,  which  in  turn  can 
be  directly  evaluated  from  the  continuous  record 
of  bed  elevation  measured  at  any  station.  If  the 
particles  are  assumed  to  be  entrained  again  only 
when  they  are  reexposed  after  they  were  buried  at 
the  end  of  the  last  jump,  the  time  interval  under  the 
elevation-time  curve  between  two  successive  cross 
points  for  a  given  elevation  is  logically  equal  to 
a  particular  rest  period  at  that  elevation.  After  one 
obtains  the  conditional  rest  period  distributions 
given  the  elevation,  one  may  determine  the  rest 
period  distribution  as  the  marginal  distribution  of 
the  conditional  one. 

Two  points  should  be  mentioned  here: 

1.  This  concept  may  be  accepted  for  uniform 
grain  bed  where  the  particle  might  deposit  on  the 
lee  side  of  the  bed  form  with  constant  probabiUty 
along  the  slope.  It  will  be  pointed  out  later  that 
different  sizes  of  particles  will  deposit  on  the  down- 
stream slope  according  to  different  probabilistic 
laws  in  case  the  bed  material  is  not  uniform.  Hubbell 
and  Sayre's  technique  fails  to  correctly  estimate  k-z 
in  this  case. 

2.  Since  the  rest  period  is  considered  to  be  ex- 
ponentially distributed  in  their  E.E.M..  this  tech- 
nique can  be  used  only  when  the  rest  period  dis- 
tribution is  exponential. 

Summary  and  discussions  of  methods  used  in 
estimating  the  parameters.— 

1.  Einstein  (4)  —  ki  and  A.,  were  estimated  by  the 
observed  fit  and  erf  for  a  particular  time. 

2.  Hubbell-Sayre  ( /^^)  —  Ai  and  A.,  were  estimated 
by  the  speed  of  the  peak  of  concentration-distri- 
bution curve  and  the  relation  between  the  peaks  of 
fi(\)  and  ^>r(v).  Since  several  c^ftx^'s  were  meas- 
ured in  sequence,  the  data  were  not  independent 
from  each  other. 

3.  Yano-Tsuchiya-Michiue  (23.  24)  — k\  and  A; 
were  estimated  by  dij.,  dt  and  daf  dt  which  were 
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determined  from  a  series  of  independently  observed 
/((jc)-curves. 

4.  Yang-Sayre  (22)  — r,  Ai  and  k-y  were  estimated 
by  the  skewness  coefficient  of  the  travel  distance 
in  addition  to  the  two  relations  used  by  Yano- 
Tsuchiya-Michiue.  They  suggested  assuming  r  and 
determining  ki  and  A  2  by  dfiildt  and  dafldt  and 
stated  that  one  may  use  these  three  values  to  esti- 
mate ft{x)  for  large  t  without  having  large  error. 
Yang-Sayre's  data  were  not  independent  from  each 
other. 

5.  Comments  — It  is  better  to  use  a  set  of  inde- 
pendent /((jc) -curves  to  estimate  the  required 
parameters.  In  case  some  unusual  discrepancy  oc- 
curred at  the  early  stage,  then  all  the  fiix)  curves 
taken  after  that  will  be  subjected  to  the  same  big 
error.  If  this  set  of  data  are  used,  one  will  involve  a 
risk  as  big  as  if  one  uses  only  one  curve  taken  at  a 
particular  time  as  did  by  Einstein. 

Implications  of  these  experimental  results.  — 
Satisfactory  data  agreements  were  claimed  by 
every  researcher  with  their  selected  stochastic 
model.  Since  the  first  few  moments  or  their  counter- 
parts which  characterize  the  position,  the  spread 
(and  the  skewness  in  Yang-Sayre's  case)  were  used 
to  estimate  the  required  parameters  to  fit  fi{x)- 
curves,  any  model  should  not  be  accepted  merely 
because  of  a  good  fit.  Stronger  arguments  should 
be  used  to  check  them.  Another  way  is  to  measure 
the  distributions  of  the  step  length  and  the  rest 
period  themselves  as  will  be  described  in  the  fol- 
lowing section. 

The  variations  of  the  mean  step  length  and  the 
mean  rest  period  with  the  flow  are  questionable 
since  the  models  used  to  estimate  them  are 
questionable. 

Experiments  of  Single  Particle 
Movement 

Preliminary  Experiments  by  Yang-Sayre  (22)  and 
Yano-Tsuchiya-Michiue  (23).  — Yang  and  Sayre  (22) 
carried  out  experiments  in  a  small  flume  with  width 
20  cm.,  depth  20  cm.,  and  length  10  m.  White 
plastic  particles  with  c?5o  =  2.2  mm.,  and  spedific 
gravity  5=1.1  served  as  bed  material.  Black 
particles  of  the  same  size  and  same  properties  were 
used  as  tracers.  The  movements  of  the  black  par- 


ticles on  dune  beds  with  an  average  dune  length 
between  2.2  and  5.5  feet  were  recorded.  The  rest 
periods  were  recorded  by  a  stopwatch,  and  the  step 
lengths  were  directly  measured  on  the  flume.  It 
was  found  that  the  step  length  was  gamma  dis- 
tributed with  r  —  2,  and  the  rest  period  was  ex- 
ponentially distributed,  that  is,  they  obtained  the 
G.E.M.  No  details  of  their  data  were  given,  and  the 
conclusion  was  rather  questionable  because  the 
measuring  technique  employed  was  preliminary. 

Another  set  of  experiments  were  done  by  Yano- 
Tsuchiya-Michiue  (23).  Their  data  confirmed  the 
E.E.M.  No  detailed  discussion  can  be  put  on  these 
experiments  since  the  procedure  of  taking  the  data 
and  the  data  themselves  were  not  given  in  their 
papers. 

Grigg  (8,  9). —  The  first  intensive  experimental 
study  on  the  movement  of  single  particles  was 
done  by  Grigg  (8,  9).  In  his  experiments,  an  auto- 
matic sensing  technique  (A.S.T.)  was  employed 
to  save  manpower.  The  A.S.T.  is  described  below 
to  make  the  further  discussion  possible. 

A  scintillation  detector  was  mounted  on  an  in- 
strumental carriage  which  was  allowed  to 
continuously  traverse  in  a  test  zone  of  a  fixed  length 
of  40  feet  and  a  cycle  time  of  a  constant  period  of 
6.8  minutes.  The  instrument  outputs  the  continuous 
record  of  radiation  intensity  on  a  strip  chart  record 
with  the  position  indicated  by  a  series  of  event 
markers.  Particle  sightings,  that  is,  the  peak  posi- 
tions of  the  recorded  radiation-intensity,  were 
obtained  when  the  carriage  passed  immediately 
above  the  particle  and  then  corresponding  times 
can  be  obtained  by  their  positions  on  the  chart 
paper  since  the  chart  paper  moved  with  constant 
speed.  Each  time  a  particle  position  changed  between 
two  successive  sightings,  it  was  assumed  that  the 
particle  had  moved  at  a  point  in  time  midway  be- 
tween the  two  sightings.  The  rest  periods  and  step 
lengths  can  then  be  obtained  from  the  above  record. 

His  results  suggested  that  the  step  length  is 
gamma  distributed  with  density  given  in  equation 
56  and  the  rest  period  distribution  is  either  ex- 
ponential with  density  given  in  equation  10  or  gamma 
with  density  similar  to  equation  56.  The  G.E.M. 
was  accepted  by  Grigg  because  of  the  convenience. 

Some  important  conclusions  are  given  below: 

1-  frivitly)  =  ae^"-^-"^^"\  where  oc  and  )8  are 
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constants  varying  with  the  flow  condition.  This  was 
obtained  by  the  method  suggested  by  Hubbell  and 
Sayre  as  described  previously.  Note  that  the 
parameter  is  a  function  of  the  elevation  H. 

2.  The  mean  step  length  increased  with  the  in- 
crease of  dune  length.  The  same  result  was  obtained 
by  Yang  and  Sayre  (22).  The  distribution  of  bed 
form  length  was  gamma. 

3.  The  mean  step  length  increased  with  the 
increase  of  the  stream  power. 

4.  The  mean  rest  period  decreased  as  the  velocity 
of  the  bed  form  increased. 

5.  A  check  of  Yang-Sayre's  G.E.M.  (22)  was 
done.  Prediction  of  the  fj-i  with  the  result  of  r  and 
k\  obtained  by  Grigg  was  comparable  to  Yang  and 
Sayre's  experimental  result  under  the  similar  con- 
dition. However,  the  prediction  of  erf  is  much 
smaller  than  the  values  obtained  by  Yang  and  Sayre 
in  all  cases.  Two  possible  reasons  were  given  by 
Grigg:  (1)  the  incorrectness  of  the  data  and  (2)  the 
invalidity  of  the  G.E.M. 

Hung  and  Shen  (//)  discussed  Grigg's  technique 
and  the  implications  of  his  data.  This  will  be  pre- 
sented in  the  next  section.  Here  we  discuss  a  pos- 
sible reason  which  might  cause  the  extreme  large- 
ness of  the  sensed  erf  for  Grigg's  data. 

In  order  to  measure  the  concentration-distribution 
function  when  the  water  is  flowing,  Yang  and  Sayre 
must  have  the  carriage  traveling  fairly  fast.  Un- 
fortunately, this  velocity  is  not  indicated.  It  is  com- 
monly known  that  there  is  a  time  lag  for  the  counting 
rate  of  the  scintillation  detector  to  reach  its  constant 
rate.  This  effect  will  usually  make  the  measured 
peak  occur  some  distance  downstream  from  the  real 
peak.  Thus,  the  mean  velocity  of  the  tracers  ob- 
tained by  Yang  and  Sayre  might  be  longer  than  the 
real  value.  It  is  possible  that  if  the  strength  of  the 
radioactivity  along  the  flume  is  measured  by  a  fast- 
moving  detector,  the  radioactive  rays  which  were 
not  registered  by  the  detector  might  be  much  larger 
for  larger  strength  (not  in  proportion  but  to  some 
power  of  it),  then  the  measured  curve  will  be  lowered 
and  flattened,  which  after  normalized  to  make  area 
unity  will  have  larger  variance.  This  idea  is  sche- 
matically shown  in  figure  5. 

H iiniiS hen's  domintter  Simulation  Work  (11).— 
Hung  and  Shen  examined  the  A.S.T.  used  by  Grigg 
to  obtain  his  first  intensive  set  of  data.  '\  hv\  touiul 
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FiGl'RE  5.  —  Deformation  of  the  roncentration-distribution  curve. 


that  some  step  lengths  (or  rest  periods)  would  be 
combined  into  a  larger  step  length  (or  rest  period) 
by  the  detection  technique.  The  question  was  raised 
on  whether  this  eff^ect  could  transform  an  ex- 
ponential distribution  of  the  step  length  into  a  bell- 
shaped  one  which,  in  turn,  is  fitted  with  a  gamma 
density  by  Grigg. 

They  found  that  the  step  length  distribution 
will  be  deformed  by  lowering  the  density  for  small 
step  length  and  raising  the  density  for  large  step 
length.  However,  they  found  by  using  the  computer 
simulation  method  that  if  the  E.E.M.  is  accepted, 
the  deformed  step  length  distribution  cannot  be 
bell-shaped  as  obtained  by  Grigg's  experiments. 
This  points  out  that  the  E.E.M.  cannot  be  vaUd 
at  least  under  the  condition  with  which  Grigg  took 
his  data,  and  that  Grigg's  data  can  be  used  only 
when  they  are  properly  corrected. 

Owing  to  the  cyclic  sensing  of  the  detector,  the 
sensed  step  lengths  have  strong  tendencies  to 
center  at  multiples  of  the  half  cycle  time.  It  was 
proven  that  about  two-thirds  of  the  data  will  center 
at  even  multiples  of  the  half  cycle  time  and  the  other 
one-third,  the  odd  multiples.  That  is.  there  exists  a 
strong  periodic  efi^ect  superimposed  on  the  density 
of  the  sensed  rest  period.  This  phenomenon  was 
seen  not  only  in  the  results  obtained  by  the  com- 
puter simulation  but  also  in  Grigg's  data,  although 
G»rigg  was  not  aware  of  it. 

The  correction  procedure  for  Grigg's  data  based 
on  the  E.E.M.  was  found  by  Hiuig  and  Shen.  Thev 
also  pointed  out  that  with  diflVrciit  assumed  models, 
similar  procedure.*   may   be  employed. 
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Recent    Studies    on   Single  Particle 
Movement  by  the  Writers 

Recognizing  the  importance  of  the  single  particle 
movement  study,  the  discrepancies  in  the  sampling 
techniques  used  by  the  previous  researchers,  and 
the  lack  of  good  data,  the  writers  developed  a 
manual  tracing  technique  with  the  movement  of 
particles  followed  manually  with  the  help  of  radio- 
isotope technique.  With  this  proposed  technique, 
the  theoretical  consideration  on  the  error  in- 
troduced, namely,  the  combination  of  small  rest 
periods,  leads  to  the  conclusion  that  if  the  step 
length  is  exponentially  distributed,  then  the  sensed 
step  length  will  be  distributed  with  another  ex- 
ponential distribution,  that  is,  the  sensed  step  length 
distribution  should  be  J-shaped. 

The  movement  of  the  particle  in  the  three- 
dimensional  space  (x,  y,  z)  was  closely  followed  by 
the  writers  in  an  8-foot  flume  according  to  the 
manual  tracing  technique.  This  enabled  the  writers 
to  study  the  dispersion  of  sand  particles  in  the  three- 
dimensional  space. 

Step  Length  Distribution.  — The  sensed  step 
length  is  bell  shaped  which  can  be  fitted  by  a  gamma 
density  with  r  >  1  but  not  by  an  exponential  density. 
By  the  argument  set  in  the  first  paragraph,  we  again 
conclude  that  the  step  length  cannot  be  exponen- 
tially distributed.  Other  means  should  be  found  to 
support  the  conclusion  that  the  step  length  is  gamma 
distributed.  An  example  is  shown  in  figure  6. 

Rest  Period  Distribution.  — The  rest  period  dis- 
tribution is  seen  to  be  V-shaped.  Without  proper 
correction,  it  cannot  be  nicely  fitted  by  a  gamma 
density  with  r<  1  or  exponential  density.  The  devia- 
tion is  clearly  shown  in  the  probability  distribution 
function  of  the  rest  period.  The  proper  correction 
procedure  is  now  being  studied.  An  example  of  the 
rest  period  distribution  is  shown  in  figure  7. 

Deflection  ^ng/e.  — Deflection  angle  defined  as 
0  =  tan~'  ^zlA.x,  where  Ax  is  the  step  length  in 
the  flow  direction  and  Az  is  that  in  the  traverse 
direction,  is  an  important  variable  used  in  de- 
scribing the  lateral  dispersion.  It  is  found  that  6  is 
distributed  with  a  symmetric  bell-shaped  density. 
The  fitted  normal  density  seems  to  be  less  sharp 
than  the  sensed  data.  An  example  is  also  shown 
in  figure  8. 


The  Probability  That  the  Particle  Will  Deposit  at 
Elevation  H*^h*.  — (Ti*  is  the  normaUzed  height 
and  is  equal  to  hi  ho,  0  <  h*  <  I,  h  is  the  elevation 
from  trough  to  any  point  on  the  downstream  slope, 
and  ho  is  the  particular  trough  to  peak  height  of 
a  given  dune):  In  order  to  determine  this  probability, 
the  bed  material  along  the  downstream  slope  of  a 
dune  is  sampled  after  the  water  was  drained.  The 
downstream  slope  was  divided  into  10  zones,  and 
the  sampled  size  distribution  in  each  zone  can  then 
be  used  to  find  the  fraction  of  some  size  which  will 
be  deposited  at  a  certain  zone,  and  this  in  turn  gives 
the  probability  for  that  sized  particle  to  deposit 
at  the  parti/:ular  zone.  It  is  seen  from  figure  9  that 
median  sized  particles  have  uniform  probability 
to  deposit  on  any  increment  of  h*.  Coarse  material 
has  larger  probability  of  falling  in  the  lower  zone, 
and  the  reverse  is  true  for  fine  particles.  This 
phenomenon  happened  in  our  experiments  of  fairly 
uniform  particles.  Thus,  it  should  be  greater  with 
the  graded  material.  The  hydraulic  sorting  is  closely 
related  to  this  mechanism.  The  fact  that  ///*(/?*) 
is  different  for  different  sizes  should  be  properly 
treated  not  only  in  the  y-dispersion  but  also  in  the 
longitudinal  dispersion.  This  phenomenon  also 
rules  out  the  use  of  yit)  to  find  /r|v  and  then 
frit)  for  the  bed  of  mixture. 

Other  Researchers.  — With  their  own  data,  the 
writers  are  trying  to  investigate  the  "independence" 
assumption  used  by  various  researchers  in  deriving 
their  models.  The  probability  density  for  a  particle 
to  deposit  at  any  y  is  also  under  investigation. 
Finally,  the  use  of  all  the  data  in  x-,  y-,  and  z- 
dispersion  is  being  planned. 

Conclusions  and  Future  Studies 

To  use  the  stochastic  models  to  solve  the  bedload 
transport  problem  is  a  promising  approach  since  the 
randomness  of  the  very  mechanism  of  the  bedload 
movement  has  been  taken  care  of. 

The  order  of  generahty  of  all  existing  models  is 
evaluated  and  the  relationships  among  them  are 
found  in  this  paper.  As  shown  in  figure  3,  Sayre- 
Conover's  G.O.M.  (16)  and  Shen-Todorovic's 
G.O.M.  (21 )  are  of  the  highest  order.  The  former 
one  is  more  general  in  the  sense  that  it  does  not 
ignore  any  high-order  terms,  while  the  latter  one  is 
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more  general  in  the  sense  that  it  does  not  rule  out 
the  cases  where  the  step  lengths  are  not  i.i.d.  Yang- 
Sayre's  G.E.M.  can  be  obtained  by  Sayre-Conover's 
G.O.M.  Although  the  corresponding  ki{x)  has  been 
found,  the  G.E.M.  is  not  exactly  a  special  case  of 
Shen-Todorovic's  G.O.M.  Sayre-Hubbeli  and  Ein- 
stein's E.E.M.  can  be  obtained  by  all  the  above 
three  models.  Crickmore-Lean's  model  (2)  is  the 
most  inaccurate  one. 


Einstein  and  Hubbell-Sayre's  G.E.M.  assumed 
that  the  probability  for  a  particule  to  make  a  jump 
at  any  {x,  x  +  ^]  or  {t,  t  +  ^t]  is  constant  and  can 
be  written  as  kiAx  or  ki^t.  In  a  steady  uniform  flow 
over  a  plane  bed,  we  can  at  most  state  that  the  dis- 
tributions of  the  step  length  and  the  rest  period  are 
i.i.d.  and  not  be  a  function  of  either  x  or  t.  This  has 
nothing  to  do  with  the  constancy  of  the  above 
probabihties.  They  can  be  constant  only  when  the 
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step  length  and  the  rest  period  are  exponentially 
distributed.  This  model,  though  theoretically  sound, 
has  been  proven  by  Hung  and  Shen  to  be  a 

wrong  one. 

Shen  and  Todorovic  released  the  above  assump- 
tions and  assumed  that  the  probabilities  are 
ki{x)^x  and  k2{t)^t.  This  does  not  mean  that  the 
distribution  of  X  is  a  function  of  x  and  the  distribu- 


tion of  r  is  a  function  of  t.  The  ki  (x)  for  gamma  dis- 
tributed step  length  is  given  in  equation  28,  which 
is  not  a  constant.  Shen  and  Todorovic's  model  is 
important  not  because  the  step  length  distribution 
can  vary  with  x  and  the  rest  period  distribution  can 
vary  with  t,  but  because  it  allows  one  to  have  dif- 
ferent distributions  for  the  step  length  and  the  rest 
period  by  varying  the  function  ki{x)  and  kz{t). 


Rest  Period,  f  (min)  Rest  Period,  t  (min) 


Figure  7.  — Sensed  rest  period  distribution  taken  by  the  manual  tracing  technique. 
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If  ki{x)  and  ki{t)  can  be  determined  by  experi- 
ments, one  may  further  specify  the  processes  both 
in  X  and  in  t. 

Sayre-Conover's  G.O.M.  is  probably  the  best  one 
since  it  represents  the  function  fi{x)  by  the  step 
length  distribution  and  the  probabihty  that  exact 
n  jumps  occur  in  (0,  t]  instead  oi  kiix)  and  ^2(0' 
which  are  not  easy  to  find. 

Grigg  s  data  can  be  used  to  determine  the  dis- 
tributions of  the  step  length  and  the  rest  period 
only  when  they  are  corrected.  The  conclusions 
drawn  by  Grigg,  such  as  the  validity  of  the  G.E.M., 
based  on  these  uncorrected  data,  are  questionable. 

Our  data  indicate  that  the  step  length  distribu- 
tion is  bell-shaped  and  the  rest  period  distribution 
is  y-shaped.  This  does  not  guarantee  that  the  G.E.M. 
is  valid. 

To  arrive  at  any  conclusive  conclusions  as  of  what 
distributions  the  step  length  and  the  rest  period  are, 
attention  should  be  paid  to  the  correction  of  the 
errors  inherent  in  the  particular  sampling  tech- 
niques. Hung  and  Shen's  work  {11)  indicates  the 
potentiality  of  using  the  computer  simulation 
method  (or  the  Monte-Carlo  technique)  to  serve 
this  end  if  the  theoretical  derivation  is  impossible. 

The  deflection  angle,  which  is  an  important  vari- 
able in  describing  the  lateral  dispersion  of  the  bed- 
load  particles,  is  found  to  be  distributed  with  a 
symmetric  bell-shaped  density  as  given  in  figure  7. 

The  probability  density  of  the  deposit  elevation 


Course  Moterial 
d>2  00rnm 


Run  I ,  No  3 


-N  (0,0,16) 


Run  I ,  No.  4 


-12  -8  -4      0     4     8     12         -12    -B   -4     0     4     8  12 
Deflection  Angle, 0  It/24  Rodions) 

Figure  8. —  Distribution  of  th«-  deflection  angle. 


Mediunf>  Molefiol     Pine  Moterial 
4  lmm>d>l  00mm  0500rrm>(J>0354fTim 


I  2         3        0         I        0  I 

Probobility  Density  of  llie  Elevation  of  Deposit, 
f„*(h*) 


Figure  9.  —  Probability  distribution  of  the  normalized  deposition 
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is  found  to  vary  with  particle  sizes.  As  indicated  in 
figure  9.  it  is  uniform  for  the  median  sized  particles. 
The  coarser  particles  have  larger  probability  of 
depositing  at  lower  elevations  and  the  reverse  is 
true  for  finer  particles. 

Future  studies  are  suggested  as  follows: 

•  To  check  the  assumptions  used  by  different 
researchers  to  derive  their  models  by  a  compre- 
hensive set  of  good  data. 

•  To  carry  out  enough  experiments  of  the  move- 
ment of  single  particles  to  determine  the  distri- 
butions of  the  step  length  and  the  rest  period  with 
parameters  as  functions  of  How. 

•  To  investigate  theoretically  the  properties  of 
fiix)  based  on  the  above  results  so  that  one  can 
grasp  the  nature  of  the  dispersion  of  bedload 
particles. 

•  To  study  the  distributions  of  the  step  length 
and  the  rest  period  with  the  bed  form  at  the  same 
time  since  they  are  closely  related. 

•  To  develop  the  stochastic  models  for  the  steady 
nonuniform  flows  where  the  distributions  of  the  step 
length  and  the  rest  period  are  functions  of  distance 
.V  only. 

AoknoM  lotlfjnioiil 

We  wish  to  express  our  thanks  to  tlte  National 
Science  Foundation  for  tlieir  financial  support  of 
this  studv  through  NSF  (Mant  t^KlU^^. 


286 


MISCELLANEOUS  PUBLICATION  NO.  1275,  U.S.  DEPARTMENT  OF  AGRICULTURE 


Literature  Cited 


(1)  Bailey,  N.  T.  J. 

1964.  THE  ELEMENTS  OF  STOCHASTIC  PROCESSES  WITH 
APPLICATION  TO  THE  NATURAL  SCIENCES.     249  pp. 

John  Wiley  &  Sons,  Inc. 

(2)  Crickmore,  M.  J.,  and  Lean,  G.  H. 

1962.  THE  MEASUREMENT  OF  SAND  TRANSPORT  BY  THE 
TIME-INTEGRATION    METHOD    WITH  RADIOACTIVE 

TRACERS.  Roy.  Soc.  (Lond.'.n)  Proc.  A270,  pp. 
27-47. 

(3)  Du  Boys,  P. 

1879.  ETUDES  DU  REGIME  DU  RHONE  ET  L'ACTION  EXERCEE 
PAR  LES  EAUX  SUR  UN  LIT  A  FOND  DE  GRAVIERS 
INDEFINIMENT    AFFOUILLABLE.     Ann.     des  PontS 

et  Chaussees,  ser.  5,  v.  18,  pp.  141-195. 

(4)  Einstein,  H.  A. 

1937.  DER  GESCHIEBETRIEB  ALS  WAHRSCHEINLICHKEITS- 
PROBLEM  [bEDLOAD  TRANSPORT  AS  A  PROBLEM  OF 

probability].  Mitt.  der  Verschsanstalt  fiir 
Wasserbau,  an  der  Eidg.  Tech.  Hochsch.  in  Ziirich. 
110  pp.  Verlag  Rascher  &  Co. 

(5)   

1942.     FORMULAS  FOR  THE  TRANSPORTATION  OF  BED  LOAD. 

Amer.  Soc.  Civ.  Engin.  Trans.  107:  561-577. 

(6)   

1950.  THE  BED-LOAD  FUNCTION  FOR  SEDIMENT  TRANS- 
PORTATION IN  OPEN  CHANNEL  FLOWS.     U.S.  Dept. 

Agr.  Soil  Conserv.  Serv.  Tech.  Bui.  1026,  71  pp. 

(7)  Graf,  W.  H. 

1971.  hydraulics  of  sediment  transport.  513  pp. 
McGraw-Hill  Book  Co. 

(8)  Grigg,  N.  S. 

1969.  motion  of  single  particles  in  sand  channels. 
142  pp.  Ph.D.  Dissertation,  Civ.  Engin.  Dept., 
Colo.  State  Univ.  Fort  Collins. 

(9)   

1970.  MOTION  OF  SINGLE  PARTICLES  IN  ALLUVIAL  CHAN- 
NELS. Amer.  Soc.  Civ.  Engin.  Jour.  Hydraul.  Div. 
96(HY12):  2501-2518. 

(10)  HUBBELL,  D.  W.,  and  Sayre,  W.  W. 

1964.     SAND    TRANSPORT    STUDIES    WITH  RADIOACTIVE 

TRACERS.  Amer.  Soc.  Civ.  Engin.  Jour.  Hydraul. 
Div.  90(HY3):  39-68. 

(11)  Hung,  C.  S.,  and  Shen,  H.  W. 

1971.  ANALYSIS  OF  SAMPLING  TECHNIQUES  FOR  SEDIMENT- 
PARTICLE  RANDOM  WALKS.  Proc.  Sedimentation 
Symposium,  June  17-19, 1971.  Univ.  Calif.  Berkeley. 

(12)  Kalinske,  a.  a. 

1947.  movement  of  sediment  as  bed  load  in  rivers. 
Amer.  Geophys.  Union  Trans.  28(4):  615-620. 


(13)  McNown,  J.  S. 

1942.  DISCUSSION  OF  "FORMULA  FOR  THE  TRANSPORTA- 
TION OF  BEDLOAD;"  BY  H.  A.  EiNSTEIN.  Amer. 
Soc.  Civ.  Engin.  Trans.  107:  591-594. 

(14)  Parzen,  E. 

I960.  modern  PROBABILITY  THEORY  AND  ITS  APPLICA- 
TIONS.   464  pp.  John  Wiley  &  Sons,  Inc. 

(15)  Raudkivi,  a.  J. 

1967.  LOOSE  BOUNDARY  HYDRAULICS.  331  pp.  Pergamon 
Press. 

(16)  Sayre,  W.  W.,  and  Conover,  W.  J. 

1967.  GENERAL  TWO-DIMENSIONAL  STOCHASTIC  MODEL 
FOR  THE  TRANSPORT  AND  DISPERSION  OF  BED- 
MATERIAL  SEDIMENT  PARTICLES.     V.  2,  pp.  88-95. 

Internatl.  Assoc.  Hydraul.  Res.  12th  Cong.  Fort 
Collins,  Colo. 

(17)   ,  and  Hubbell,  D.  W. 

1965.  TRANSPORT  AND  DISPERSION  OF  LABELED  BED 
MATERIAL,  NORTH  LOUP  RIVER,  NEBRASKA.  U.S. 

Geol.  Survey  Prof.  Paper  433-C,  48  pp. 

(18)  Shen,  H.  W. 

1971.  WASH  LOAD  and  BED  LOAD,  CHAPTER  11;  TOTAL 
SEDIMENT  LOAD,  CHAPTER  13.  In  SHEN,  H.  W. 
(EDITOR  AND  PUBLISHER).  RIVER  MECHANICS.  Vol. 

1  (P.O.  Box  606),  Fort  Collins,  Colo. 

(19)   -,  and  Cheong,  F.  H. 

1971.  DISPERSION  of  CONTAMINATED  BEDLOAD  PARTI- 
CLES. Symposium  on  Statistical  Hydrology, 
Internatl.  Assoc.  for  Statis.  in  Phys.  Sci.,  Tucson, 
Ariz. 

(20)   ,  and  HUNG,  C.  S. 

1971.  AN  ENGINEERING  APPROACH  TO  TOTAL  BED- 
MATERIAL  LOAD  BY  REGRESSION  ANALYSIS.  ProC. 

Sedimentation  Symposium,  Univ.  Calif.  Berkeley. 

(21)   -,  and  ToDOROViC,  P. 

1971.     A     STOCHASTIC     SEDIMENT    TRANSPORT  MODEL. 

Internatl.  Symposium  on  Stochastic  Hydraul., 
May  31-June  1,  1971.  Univ.  Pittsburgh. 

(22)  Yang.  C.  T.  and  Sayre,  W.  W. 

1971.     STOCHASTIC  MODEL  FOR  SAND  DISPERSION.  Amer. 

Soc.  Civ.  Engin.  Jour.  Hydraul.  Div.  97(HY2): 
265-288. 

(23)  Yano,  K.,  Tsuchiya,  Y.,  and  Michiue,  M. 

1969a.     TRACER    STUDIES   ON   THE   MOVEMENT   OF  SAND 

AND  GRAVEL.  Internatl.  Assoc.  Hydraul.  Res.  13th 
Cong.  V.  2,  pp.  121-129.  Kyoto,  Japan. 

(24)   ,  TuscHiYA,  Y.,  and  MiCHiUE,  M. 

1969b.     STUDIES   ON   THE   SAND  TRANSPORT   IN  STREAMS 

WITH  TRACERS.  Disaster  Prevention  Res.  Inst. 
Bui.  V.  18,  part  3,  No.  141.  Kyoto  Univ.  Kyoto, 
Japan. 


•I 


1 


USE  OF  AUTOREGRESSIVE  RUNOFF  MODELS  IN  RESERVOIR  STUDIES 

By  Stephen  J.  Barges  ' 


Abstract 

The  problem  of  determining  an  adequate  model 
to  generate  streamflow  volumes  for  use  in  a  reservoir 
size  determination  study  is  examined  from  a  systems 
analysis  viewpoint.  A  brief  history  of  runoff  models 
used  in  reservoir  studies  is  given  to  illustrate  how 
available  data  and  technology  have  influenced  the 
degree  of  model  sophistication  employed.  A  simple 
annual  Markov  model  is  used  to  illustrate  problems 
associated  with  the  mechanics  of  generation,  par- 
ticularly with  respect  to  the  adequacy  of  Monte 
Carlo  sampling,  to  ensure  procedural  accuracy.  The 
sensitivity  of  reservoir  storage  requirements  to 
slight  changes  in  model  parameter  values  illustrates 
parameter  ranges  where  even  the  simplest  sto- 
chastic runoff  models  are  critically  limited  by 
input  data  uncertainties.  For  such  cases,  the  end 
product  of  an  operation  study  can  be  extremely 
uncertain. 

There  clearly  is  a  limit  to  the  advances  that  can 
be  made  in  engineering  hydrology  by  considering 
watershed  runoff  as  a  stochastic  process.  When 
runoff  variability  is  large,  more  reaUstic  streamflow 
volume  models  can  be  constructed  by  modeling 
precipitation  as  a  stochastic  process  and  routing 
stochastically  generated  precipitation  through  a 
deterministic  watershed  model  thus  more  accurately 
reflecting  the  complex  interactions  that  occur  within 
the  watershed. 

Introduction 

I  have  attempted  to  outline  some  of  the  problems 
that  hydrologists  face  when  they  resort  to  statistical 
methods  of  runoff  generation.  It  is  important,  how- 
ever, that  we  use  good  judgment  whenever  a  deter- 
ministic event  is  represented  by  a  lumped-system 
model  such  as  a  statistical  model  of  runoff  volumes. 
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Fundamental  to  any  model  is  the  problem  for 
which  it  is  designed.  We  should  therefore  ask,  for 
a  given  problem,  what  is  the  ideal  model:  what  lesser 
model  could  we  accept,  and  what  should  be  the 
philosophy  of  modeling? 

We  must  be  careful  to  differentiate  between 
engineering  and  scientific  hydrology.  Engineering 
hydrology  principally  uses  macro  explanations  of 
natural  phenomena  so  that  workable  models  can 
be  designed  to  aid  solution  of  engineering  problems. 
Scientific  hydrology  is  principally  based  upon  find- 
ing causative  factors,  usually  on  a  micro  basis  — 
sometimes  this  information  can  be  incorporated 
conveniently  into  macro  models  and  used  in  en- 
gineering applications.  Clearly,  the  former  problems 
under  some  circumstances  can  be  modelled  statis- 
tically. It  is  doubtful  that  the  latter  category  can 
receive  much  help  from  stochastic  analyses. 

I  have  chosen  to  discuss  uses  and  models  of 
streamflow  data  when  the  watershed  can  be 
treated  as  a  lumped  system  particularly  with  respect 
to  determining  what  size  reservoir  should  be  built 
to  guarantee  some  specified  supply  reliability.  This 
is  not  necessarily  a  good  criterion  for  selecting 
reservoir  size,  but  it  is  a  helpful  concept  to  explore 
the  adequacy  of  a  model. 

Basic  Reservoir  Sizing  Problem 

The  problem  is  to  determine  what  size  to  build 
the  reservoir  under  some  constraint  concerning 
the  supply  of  the  desired  demand  sequence.  Pro- 
vided that  some  measured  flows  are  available,  the 
inflow  secpience  might  be  treated  as  either  random 
or  partially  deterministic  with  st>me  fluituation 
about  a  deterministic  trend.  This  approach  is 
predicated  \\\wn  there  being  no  regulation  above 
the  point  of  interest  such  as  a  major  lake  or  reservoir 
or  diversion  structure. 

One  of  the  major  problems  the  hydrologist  faces 
lies  in  mt>deling  the  inflow  into  the  proposed  reser- 
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voir  or  to  model  inflow  to  a  reservoir  of  known  or 
chosen  size.  The  former  case  is  one  where  approxi- 
mate sizing  is  required,  whereas  the  latter  is  usually 
required  for  an  operation  study  to  determine  release 
consistent  with  some  criteria  of  operation  be  it 
physical,  economic,  or  both. 

There  have  been  a  few  basically  similar  ap- 
proaches to  solve  the  so-caUed  reservoir  problem. 
For  all  such  approaches,  it  is  instructive  to  view 
the  attempted  solutions  from  a  simplified  systems 
analysis  point  of  view,  that  is,  what  are  the  ob- 
jectives of  the  approach  (models),  the  criteria  by 
which  their  effectiveness  can  be  evaluated,  the  avail- 
able resources,  the  constraints,  and  management 
of  the  system.  The  final  important  circular  element, 
feedback,  is  often  missing. 

The  accuracy  of  hydrologic  modeling  per  se  in 
some  water  resource  problems  is  not  of  great 
significance  (Close  et  al.  (4))  in  terms  of  the  total 
problem  formulation.  However,  for  the  cases  where 
hydrologic  modeling  is  important,  we  definitely 
need  to  have  available  models  that  are  sophisticated 
and  that  are  economical  to  use.  Lumped-system 
watershed  models  (including  stochastic  models) 
have  a  definite  role  in  water  resource  analyses;  it 
is  important  that  this  technology  be  fully  developed. 

Probably  the  first  useful  approach  to  using  the 
observed  flow  sequence  to  size  a  reservoir  was 
made  by  Rippl  in  1883  (13 ).  He  used  the  historical 
streamflow  information  and  assumed  that  for  the 
climatic  region  he  was  working  in,  the  historical 
flow  sequence  could  be  assumed  to  be  the  future 
time  input  to  a  reservoir.  His  approach  introduced 
the  well-known  mass  curve  analysis;  the  maximum 
cumulative  difference  between  inflow  and  demand 
was  assumed  to  be  the  storage  required.  This 
method  has  obvious  shortcomings:  there  is  only  a 
remote  possibility  that  an  observed  streamflow 
record  will  repeat  itself  in  the  future;  what  should 
one  do  in  the  case  when  say  n  years  of  record  exists 
and  the  design  hfe  of  the  reservoir  is  to  be  m  years? 

Hazen  (7)  recognized  these  problems  and 
offered  an  alternative  approach  whereby  he  cal- 
culated the  probabihty  that  a  reservoir  of  size  S 
would  be  at  specified  percentages  of  emptiness. 
Hazen  combined  the  records  for  14  streams  to 
obtain  a  record  of  300  years  length.  The  true 
combined  record  obviously  was  much  shorter- 
see  Fiering  (6)  — because  of  interrelation  of  records. 


t 


He  routed  inflow  minus  demand  through  a  semi- 
infinite  reservoir  (that  is,  did  not  allow  for  spiU) 
for  several  demand  levels.  The  number  of  occur- 
rences, Ni,  that  the  reservoir  had  storage,  yields 


the  probabihty  Pi 


{T  is  the  total  number  of 


discrete  inflow  events)  that  the  reservoir  would 
have  a  volume  equal  to  Si,  This  was  the  basis  of 
the  depletion  probability  method.  Its  shortcomings 
were  recognized  by  Barnes  (2)  and  Sudler  (14). 
These  three  investigators  recognized  the  inadequacy 
of  a  single  use  of  the  historical  trace;  they  attempted 
to  use  the  information  contained  in  the  record.  Only 
Sudler's  approach  required  a  simple  mathematic 
model  for  streamflow  volumes. 

The  next  different  approach  was  made  by  Moranj 
and  variations  on  his  theme  were  made  by  Gani 
Prabhu,  and  Lloyd  (see  Lloyd  (9)).  The  basic' 
assumptions  (for  the  initial  cases  examined  by 
Moran)  were  as  follows: 

1.  The  elements  of  the  inflow  sequence  are 
independent. 

2.  Storage  is  finite  and  fixed. 

3.  Draft,  storage  capacity,  and  inflow  are  all 
integral  multiples  of  a  unit  quantity. 

Later  approaches  considered  serial  correlation  o: 
inflow  (9).  The  methodology  yields  the  steady  state! 
probability  that  the  volume  or  stored  water  will  be 
the  ith  storage  state.  The  fluctuations  in  reservoir! 
contents  were  treated  as  a  homogeneous,  finite, 
Markov    chain.    This    approach,   like   the  basic 
approach  of  Hazen  is  useful  to  indicate  long-term 
system  storage  volume  probabilities  but  is  of  little 
use  in  economic  analyses  where  the  analyst  wants 
to  know  what  benefits  will  accrue  if  the  reservoii 
is  of  a  specified  size  and  the  demand  sequence  is 
known.  This  approach  necessitates  an  operation! 
study.  I 
The   Harvard  Water  Team.  Maass  and  coworkers! 
(10),  employed  stochastic  runoff  generation  models 
in  reservoir  operation  studies.  Runoff  was  modeled; 
for  a  particular  (well  behaved)  river  by  a  first-orderi 
autoregressive  scheme,  assuming  the  runoff  volumess 
to   be  normally  distributed.   The   basic,  normali 
Markov  model  for  annual  streamflow  is 


qi  =  fx  +  piqi-  ijl)  +  i,o-V'l  -p2  (1), 
and  for  monthly  streamflow  is 
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CTj  I  

j  =  Mj  +  Pj  (9.-1.  j-1  -  )  +  ti(TjVl-pj. 

(2) 

Where  /jl,  a,  p  are  population  estimates  of  the  mean, 
standard  deviation,  and  first-order  serial  correlation; 
/  is  a  sequence  index;  and  j  is  a  monthly  index;  /  is 
a  unit  random  normal  variate. 

Some  authors  have  examined  more  complicated 
procedures  using  multiple  regression  to  model 
streamflow  to  include  other  "causative  factors." 
Some  have  tried  principal  component  analysis  to 
avoid  the  possibility  of  multiple  colinearity  (see 
Matalas  (12)).  Others,  see  Mandelbrot  and  Wallis 
ill),  have  tried  different  generation  techniques  so 
that  they  might  be  able  to  account  for  the  so-called 
Hurst  phenomenon  (8). 

Storage  Analysis  — Markov  Inflow 
Model 

It  is  important  now  to  address  ourselves  to  the 
generation  problem  from  a  systems-planning  point 
of  view,  remembering  that  hydrology,  per  se.  used 
as  a  description  of  past  events  is  useless.  In  this 
light,  let  us  examine  use  of  one  of  the  simplest  pos- 
sible generators  — the  annual  normal  Markov  model. 
It  will  become  fairly  obvious  what  constraints  apply 
and  that  we  lack  adequate  performance  criteria. 

For  a  reservoir  operation  study  or  mass  curve 
analysis,  the  objectives  of  the  inflow  generator 
should  be  as  follows: 

1.  Because  a  sensible  economic  horizon  of 
operation  should  be  less  than  about  40  years, 
it  is  unnecessary  to  attempt  to  model  long- 
term  trends  such  as  Hurst  phenomenon  (8). 

2.  The  future  is  unknown  so  our  most  useful 
information  (at  this  stage  of  man's  knowl- 
edge) is  the  previously  measured  record 
whether  it  be  runoff  or  precipitation  plus 
other  meteorological  and  physiographical 
data. 

3.  Since  it  is  desirable  to  locate  inlorniation 
contained  in  the  record  apart  from  the  his- 
torical sequencing  of  events,  all  that  is 
necessary  is  that  the  model  reproduce 
faithfully  the  statistical  properties  of  the 
observed  data.  We  must  obviously  restrict 


the  number  of  variables  used  to  sensibly 

describe  the  observed  data. 
The  model  should  not   generate  negative  flows  nor 
exceptionally  large  flows;  it  really  should  only  be 
capable  of  interpolation  within  the  observed  data. 

Let  us  examine  a  few  cases  where  the  model  and 
its  parameters  are  known  with  absolute  certainty. 
Postulate  that  the  inflow  is  normally  distributed 
with  parameters  fi,  cr,  and  first  order  serial  correla- 
tion p.  For  specified  values  of  p.,  cr,  and  p,  specified 
demand,  and  a  given  facility  life  we  can  determine 
a  probabiHty  distribution  of  storage;  this  latter 
distribution  can  be  described  by  the  extreme  value, 
type  one  probability  distribution  (Burges  and  Linsley 
(3)).  The  extreme  value  result  provides  a  useful 
means  for  illustrating  results  of  operation  studies 
employing  stochastically  generated  traces. 

Mechanics  of  Generation 

We  use  equation  1  (annual  model)  as  foUows: 
Select  an  operation  period  of  N  years;  select  an 
initial  value  of  runoff  by  a  random  process  and 
generate  A'  years  plus  a  warmup  period  of  stochastic 
runoff^,  different  values  of  runoff  volume,  q.  being 
selected  by  means  of  Monte  Carlo  sampUng.  The 
model  is  only  useful  when  adequate  Monte  Carlo 
sampling  has  been  carried  out.  The  last  A'  flow 
volumes  that  have  been  generated  are  routed 
through  a  reservoir  mass  curve  algorithm  to  obtain 
a  storage  value  S„,.  If  the  reservoir  had  had  a 
capacity  Sm,  then  this  particular  inflow  sequence 
{Q}  would  fully  satisfy  the  demand  sequence  {D}. 
The  procedure  is  repeated  until  a  sequence  of  M 
such  storage  values  has  been  obtained. 

Illustr€itions 

We  can  now  examine  some  specific  combination 
of  parameters  used  as  input  to  equation  1.  Figure 
1  indicates  a  problem  when  too  few  Monte  Carlo 
samples  are  used,  that  is.  when  M  is  small.  If  we 
recognized  this  and  are  careful  to  generate  a 
sufficiently  large  number  of  storage  values  (it  has 
been  empirically  found  that  l.(XX)  such  values  are 
adequate  (3)).  then  at  least  procedurally  we  are 
accurate  in  this  phase  of  our  analysis.  Figure  2 
shows  the  influence  of  the  coefficient  of  variation 
and  demand  upon  storage  for  a  fixed  reservoir  op- 
eration period.  Figure  3  shows  the  influence  of 
first-order  serial  correlation  up»>n  the  storage  re- 
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Figure  1 .  —  Storage  required  for  demand  =  90  percent  of  mean  inflow  —  influence  of  the  number  of  traces  used  to  define  the  distribution 

of  storage;  normal,  annual,  Markov  inflow  generator. 


quirements.  Figure  4  shows  the  influence  of  the  op- 
erating time  on  storage  requirements.  It  must  be 
remembered  that  the  simple  model  chosen  does  not 
reflect  long-term  possible  trends.  In  the  above  case, 
where  the  coefficient  of  variation  Ct  equals  0.5. 
about  2  percent  of  the  flows  generated  could  be 
negative.  Any  negative  flows  were  zeroed.  Prob- 
lems of  zero  flows  and  flows  that  are  excessively 
large  because  of  generation  from  the  tails  of  the 
normal  distribution  are  handled  eff^ectively  when 
we  use  a  large  number  of  traces,  that  is,  the  results 
of  a  few  extreme  cases  are  balanced  out  by  exten- 
sive sampling. 

Figures  2  and  3  illustrate  large  spread  in  storage 
requirements  for  small  differences  in  generator 
parameters.  Even  though  the  analysis  is  not  based 
on  economic  grounds,  a  reliability  of  the  order  of 
99  percent  is  usually  desired  for  urban  water  supply. 
It  is  obvious  that  there  is  a  large  spread  of  required 
storage  to  yield  this  probability  of  demand  satis- 
faction. Note  that  in  the  above  examples  all  the 


numbers  were  postulated,  there  was  no  uncertainty 
at  all  involved  in  the  parameters  /jl,  ct,  and  p. 

Practical  Problems 

The  hydrologist  faces  the  problem  of  real  data, 
not  the  hypothetical  cases  shown  above.  The  major 
problems  then  lie  in  model  selection,  parameter 
determination,  and  procedural  accuracy.  Can  these 
items  be  determined  from  a  straight  statistical 
approach?  How  much  physical  information  about 
the  runoff  process  can  be  employed  to  determine  a 
model  and  parameters  from  something  similar  to  a 
Bayesian  approach?  Can  we  identify  physical  reality 
in  a  one  to  one  mapping  with  a  probabilistic  model? 
If  we  could  map  reality  to  a  model,  can  we  extrap- 
olate in  our  model  beyond  the  range  of  the  observed 
streamflow  volume? 

If  we  elect  to  model  runoff  by  some  other  regres- 
sive process,  then  we  must  resolve  the  problems  of 
model  selection  and  parameter  evaluations.  The 
important  point  here  is:  what  aids  will  the  hydrolo- 
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gist  use  to  define  his  model,  and  when  will  the  aids 
be  too  weak  to  enable  him  to  model  runoff  volumes 
by  a  satisfactory  statistical-probabilistic  method? 
When  will  the  task  of  sensitivity  testing  render  the 
whole  approach  worthless? 

The  probability  distribution  defining  the  runoff 
volumes  can  be  estimated  and  tested  visually  for  a 
straight  line  fit  on  probability  paper  or  by  means 
of  a  chi-square  or  Kolmogorov-Smirnov  test.  These 
tests  are  crude;  physical  facts  are  not  incorporated, 
and  model  acceptance  is  highly  dependent  upon  a 
fairly  long  streamflow  record.  After  a  model  is 
selected,  we  then  faithfully  determine  the  appropri- 
ate parameters  (mean,  variance,  correlation,  and 
possibly  skew).  How  good  are  these  estimates? 
We  can  estimate  the  variance  of  the  mean  by  use 
of  the  central  limit  theory.  We  can  make  some 
estimate  of  the  spread  of  the  variance  by  treating 
it  as  a  chi-square  distribution.  How  do  we  know 
if  there  is  any  significant  correlation?  We  have 
Anderson's  (1 }  test  for  significance,  but  even  with 


50  years  of  available  data,  calculated  correlations 
must  be  greater  than  0.2  at  the  90-percent  con- 
fidence limit  to  be  statistically  different  from  zero. 
Correlations  that  are  not  statistically  distinguish- 
able from  zero  are,  however,  quite  important. 
Reference  to  figure  3  shows  that  storage  is  quite 
dependent  upon  small  correlations  in  annual  data. 
Statistical  estimates  by  themselves  are  obviously 
inadequate  to  describe  probabilitistic  models  and 
certainly  inadequate  to  give  realistic  values  of 
"persistence."  Remember  that  the  model  so  far 
discussed  is  the  simplest  case.  The  problem  is  not 
simplified  when  added  dimensions  are  included. 
Typically,  the  next  refinement  is  to  use  either  a 
seasonal  or  a  monthly  runoff  generation  model 
(equation  2).  The  problems  inherent  in  an  annual 
model  are  thus  amplified. 

We  can  reduce  parameter  uncertainty  by  lengthen- 
ing the  streamflow  base  from  which  the  model 
parameters  are  determined.  This  can  be  done  by 
correlation   (Fiering    (6))    or   by   simulating  the 


Figure  2.  — Storage  distributions  (defined  by  1,000  trai-es)  for  constant  and  linearly  increasing  demand  (annual  ini  ren\ents>  — normal, 

annual.  Markov  inflow  generator  —  liinh  and  low  inflow  variabililv. 
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watershed  (Crawford  and  Linsley  (5)).  At  best  we 
can  extend  the  streamflow  record  to  have  the  same 
length  as  the  longest  precipitation  record  of  a 
nearby  precipitation  gage. 

It  is  most  important  to  recognize  the  relative 
magnitude  of  the  problems.  The  problem  of  param- 
eter estimation  is  obviously  less  severe  for  relatively 
uniform  conditions,  where  annual  coefficients  of 
variation  of  runoff  are  low,  than  it  is  for  the  case  of 
higher  coefficients  of  variation  for  example  (see 
fig.  2). 

In  monthly  models,  the  coefficient  of  variation 
will  exceed  0.5  if  the  annual  value  is  0.5  and  correla- 
tion will  indeed  be  large  — flow  volume  in  some 
months  commonly  have  correlations  with  preceed- 
ing  months  of  the  order  0.9.  Parameter  estimations 
for  these  individual  months  therefore  will  have 
very  large  uncertainty  associated  with  them. 
Refinement  below  a  1-month  period  would  intro- 
duce so  much  uncertainty  into  parameter  estimates 
as  to  be  useless  for  modeling  except  when  the  flow 
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is  invariant.  Fortunately  for  most  reservoir  studies, 
a  monthly  time  increment  is  not  too  coarse. 

The  hydrologist  then  must  lean  heavily  toward  the 
mathematically  inclined  for  answers  to  the  following 
questions.  What  are  the  mathematical  limitations 
of  a  stochastic  runoff  generation  model?  When  does 
the  model  become  so  data  dependent  that  it  is 
useless?  When  can  we  supplement  statistical  in- 
ferences legitimately  with  physical  reasoning? 
How  much  determinism  based  on  physical  rea- 
soning should  be  incorporated  into  a  runoff  model? 
Given  our  parameter  estimation  uncertainty,  is  it 
reasonable  for  a  runoff  generation  model  to  attempt 
to  statistically  simulate  the  future  assuming  it  toj 
be  basically  the  same  as  that  of  the  past?  How  good 
an  end  product  from  the  analysis  can  we  expect 
given  the  uncertainty  of  model  inputs?  When,  be- 
cause of  uncertainty,  is  it  prudent  to  cease  attempt- 
ing to  model  actual  runoff  phenomenon  by  auto- 
regressive  or  multiple  regression  approaches?  How 
much  can  be  gleaned  from  a  statistical  treatment  of 
a  watershed? 
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Figure  3. —Influence  of  serial  correlation  in  the  inflow  sequence  on  the  required  storage  distribution  — normal,  annual,  Markov  inflow 

generator  (each  distribution  defined  by  1,000  traces). 
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Figure  4. —  Influence  of  economic  life  on  required  storage  — normal,  annual,  Markov  inflow  generator  (each  distribution  defined  by  1,000 

traces). 


The  last  question  brings  us  to  an  important  realiza- 
tion because  we  are  historically  at  a  stage  of  tech- 
nological development  where  more  and  more  prob- 
lems that  once  could  only  be  studied  statistically 
can  be  treated  in  a  semideterministic  way.  Given 
that  runoflf  from  a  watershed  is  the  end  product  of 
complex  interactions  with  incident  precipitation, 
interactions  that  have  more  influence  than  that  of 
simply  smoothing  the  precipitation,  would  we  be 
better  able  to  do  hydrologic  modeling  by  modeling 
precipitation  and  routing  the  precipitation  througli 
a  deterministic  watershed  model  when  the  stream- 
flow  is  variable  and  only  a  short  streamflow  record 
exists?  Such  treatment  would  be  far  superior  in 
measuring  watershed  response  to  realistic  atmos- 
pheric fluctuations.  Analysis  costs  clearly  will  be 
an  important  factor. 

Summary  ami  (.onolusions 

There  is  no  question  that  statistical  models  of 
runoff"  are  very  useful  when  there  is  little  variation 


in  runoff.  As  runoff"  variability  increases,  there  prob- 
ably exists  a  Umit  to  the  information  that  statistical 
models  can  yield  principally  because  of  lack  of 
confidence  of  the  model  basis  and  in  the  accuracy 
of  model  parameters.  We  need  a  reasonably  ac- 
curate identification  of  these  limits  for  autoregres- 
sive  and  multiple  regression  models  that  are  used 
to  describe  runoff  volumes.  Limits  of  usefulness  of 
stochastic  runoff*  models  could  possibly  be  es- 
tablished in  terms  of  an  index  of  data  variability  as 
well  as  interperiod  correlation  and  the  intended 
application  of  the  model.  Multiple  site  models 
would  possibly  have  tighter  limits  on  individual  site 
variability  of  data. 
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A  STOCHASTIC  MODEL  OF  SEDIMENT  YIELD  FOR 
EPHEMERAL  STREAMS  ' 


By  D.  A.  W oolhiser  and  P.  Todorovic 


Abstract 

The  accumulated  sediment  yield  of  an  ephemeral 
stream  in  some  time  interval  (0,  t]  may  be  treated 
as  the  sum  of  a  random  number  of  random  variables. 
The  number  of  sediment  yield  events  in  (0,  t]  is 
equal  to  the  number  of  runoff  events  and  is  related 
to  the  number  of  precipitation  events.  Three 
progressively  more  comphcated  rainfall-runoff 
models  are  postulated:  (1)  the  pure  threshold  model, 
(2)  the  general  threshold  model,  and  (3)  the  in- 
filtration model.  Analytic  results  for  the  runoff 
counting  process  are  obtained  for  the  pure  threshold 
model  and  an  approximation  to  the  infiltration  model 
is  proposed.  The  resulting  runoff  counting  process 
gave  a  good  fit  to  data  from  two  small  watersheds 
near  Hastings,  Nebr.  An  example  is  given  of  the 
apphcation  of  the  stochastic  model  of  sediment 
yield  to  the  problem  of  estimating  mean  and 
variance  of  annual  sediment  yield  if  only  short 
concurrent  records  of  sediment  yield  and  runoff 
are  available,  but  longer  records  of  precipitation 
and  runoff  can  be  utilized. 


Introduction 

The  process  involving  transport  of  sediment 
past  any  section  of  a  stream  channel  system  is 
clearly  stochastic  in  the  sense  that  we  cannot  make 
deterministic  predictions  about  the  future  behavior 
of  the  system,  given  the  present  state. 

Estimates  of  the  sediment  yield  of  a  watershed 
are  important  in  the  design  of  dams,  canals,  and 
other  structures  and  in  evaluating  land  management 
practices.  Recently,  studies  of  sediment  yield  have 
emphasized  the  importance  of  sediment  as  a  pol- 
lutant or  as  a  carrier  of  substances,  such  as  radio- 
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active  materials,  pesticides,  or  nutrients,  which  may 
be  pollutants.  In  all  of  these  appUcations.  the 
variation  of  the  sediment  yield  from  year  to  year  as 
well  as  the  mean  value  is  important.  Unfortunately, 
records  of  sediment  transport  by  streams  are  usually 
much  shorter  than  records  of  streamflow  or  are  not 
available  at  all.  For  this  reason,  it  is  desirable  to 
construct  a  formal  stochastic  model  of  sediment 
yield  that  has  a  structure  dependent  on  the  runoff 
process  and  the  rainfall  process.  Such  a  model  will 
provide  a  framework  for  the  analysis  of  short  rec- 
ords of  sediment  yield.  If  the  model  has  an  appro- 
priate structure,  it  may  be  possible  to  relate  the 
model  parameters  to  climatic,  geologic,  or  land 
management  factors. 

Several  investigators  have  considered  annual  sedi- 
ment yield  from  the  frequency  viewpoint  (7,  8,  15), 
but  they  were  primarily  concerned  with  demonstrat- 
ing the  relative  importance  of  extreme  events  and 
did  not  construct  a  formal  stochastic  model.  Murota 
and  Hashino  (6)  developed  a  stochastic  model  of 
sediment  yield  based  upon  stochastic  models  of 
streamflow  and  precipitation.  Their  model  has 
many  desirable  features  but  is  so  comphcated  that 
they  had  to  resort  to  simulation  techniques  to  obtain 
the  distribution  of  sediment  yield. 

The  objective  of  this  paper  is  to  present  a  sto- 
chastic model  of  sediment  yield  that  is  based  upon 
stochastic  models  of  streamflow  and  precipitation 
and  yet  is  simple  enough  that  analytic  results  can 
be  obtained  for  the  mean  and  variance  of  the 
annual  sediment  yield  or  the  accumulated  yield  after 
a  period  of  T  years.  This  model  will  be  limited  to 
ephemeral  streams  — streams  that  carry  only  surface 
runoff  and  therefore  tlow  i>nly  during;  and  after  pe- 
riods ol  prec  ipitation  or  snownielt. 

Description  and  Definitions 

Sedimeitt  transported  In  sticainflow  is  frequeiitU 
divided    into   three   classes:    beiUoad,  suspended 
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bed  material  load,  and  wash  load  (2 ).  Both  the  bed- 
load  and  the  suspended  bed  material  load  are 
considered  to  have  their  primary  source  in  the  bed 
material  over  which  the  flow  passes,  whereas  the 
wash  load  is  assumed  to  be  very  fine  sediments 
originating  mostly  from  erosion  of  land  surfaces 
and  not  present  in  significant  amounts  in  the  stream- 
bed.  Bedload  transport  is  concentrated  near  the 
bed  where  particles  roll  or  sHde  or  may  move  in 
a  series  of  short  jumps.  The  suspended  bed  material 
load  has  its  highest  concentration  near  the  bed  of 
the  stream,  and  the  concentration  varies  with 
distance  above  the  streambed  in  a  manner  con- 
sistent with  a  process  governed  by  turbulent 
diffusion.  The  concentration  of  the  wash  load  is 
practically  independent  of  position  above  the 
streambed  and  depends  primarily  on  the  availa- 
bihty  of  material  to  be  transported  into  the  stream 
rather  than  on  hydraufic  parameters.  Although  the 
instantaneous  rate  of  bedload  transport  through 
a  stream  cross  section  under  steady  flow  conditions 
may  exhibit  large  fluctuations,  the  time-averaged 
mean  rate  is  considered  to  be  a  unique  function 
of  the  flow  parameters.  Thus  one  would  expect 
some  correlation  between  total  bed  material  trans- 
port rate  and  hydraulic  flow  parameters. 

Suppose  that  we  have  continuous  measurements 
of  precipitation  intensity  over  a  catchment,  and 
simultaneous  measurements  of  runoff  rate  and  total 
sediment  transport  rate  at  the  mouth  of  the  catch- 
ment. Let  ^i{t),  i-iit),  and  ^sit)  denote  the 
spatially  averaged  precipitation  rate,  runoff  rate, 
and  the  sediment  transport  rate,  respectively.  Let 
i,{t)  denote  observations  of  the  rainfall-runoff- 
sediment  transport  phenomenon,  where 

=         6(0,6(0},  (1) 

and  suppose  that  the  observations  begin  at  time 
t  =  0.  Because  ^i(0  is  a  random  variable  for  all 
^>0  and  i  —  l,  2,  3,  we  have  three  families  of 
random  variables: 

Ui{t);t>0}       1=1,2,3  (2) 

or  three  continuous  parameter  stochastic  processes. 
Under  the  assumption  that^i(0  are  stochastically 
continuous  processes,  which  seems  intuitively  rea- 
sonable, the  ^iit)  are  separable  and  measurable 


random  functions  {10).  The  foDowing  self-ex- 
planatory assumptions  are  required  for  further 
development: 

1.  P{^i(0  ^0}  =  1,    for  i=l,  2,  3  and  alU^O 

2.  P{^i(5)=0,  Ys€(f,  f  +  Ai)}  >0,       for  1=1,  2,3 

f  ^  0  and  A?  >  0. 

Because  the  fKO  are  separable,  the  set  {^i(5)=0, 
Vseit,  t  +  At)}  is  measurable. 

We  shall  define  a  rainfall  event  as  any  continuous 
period  of  rainfall  where  ^i{t)  >  0.  A  runoff  event  is 
any  continuous  period  of  surface  runoff,  and  we 
shall  assume  that  each  runoff  event  causes  a  sedi- 
ment yield  event.  Typical  sample  functions  of  the 
processes  ^i{t)  are  shown  in  figure  1. 

Associated  with  the  vth  rainfall  event  is  the  time 
of  ending  7"*,^' ,  the  duration  Z\}^ ,  and  the  total  volume 
of  rainfall  ^S,"  where 

Xi'^^  r'  ^ds)ds.  (3) 
J 

Similarly,  7"'^'-'  refers  to  the  time  of  ending  of  the 
vth  runoff  event  and  is  the  same  as  T^^K  The  dura- 
tion of  the  Pih  runoff  event  Z*,^'  is  the  same  as 
Z'j,^'  and  the  volume  of  runoff  and  sediment  yield 
for  the  vth  event  are  given  by 

Z<3.  =  j^'^3(.)0?..  (5) 

Finally,  denote  hy  Xi{t),  the  integral 

Xi{t)=j'^^i{s)ds,  (6) 

which  exists  with  probability  1,  because  fi(0  is 
a  measurable  process  for  i=  1,  2,  3  (4). 

Xi{t)  represents  the  accumulated  precipitation 
in  the  interval  (0,  t],  Xzit)  represents  the  total 
amount  of  runoff,  and  ^3(0  represents  the  total  ' 
yield  of  sediment  (mass  or  weight)  in  the  same 
interval.  Xi{t)  j=  1,  2,  3  is  a  process  of  nondecreas- 
ing  sample  functions  because  A^i(f)  ^  Xi{t  +  At) .  '> 
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 RAINFALL 

 RUNOFF 

 SEDIMENT  TRANSPORT 


4- 


Figure  1.  — Sample  functions  of  the  process  i,{t}. 


Counting  Processes  for  Rainfall, 
Runoff,  and  Sediment  Yield 

We  define  the  stochastic  counting  processes 
Niit),  1=1,  2,  3  as  follows: 

Ni{t)  =  sup  {i':T[!>^t},       /=1,2,  3;  t^O 

(7) 

where  NiU)  is  the  number  of  complete  rainfall 
events  in  (0,  t]  and  Nz{t)  and  P^ait)  are  the  num- 
ber of  complete  runoff  and  sediment  yield  events  in 
(0,  t].  We  have  assumed  that  every  runoff  event  if 
also  a  sediment  yield  event  or 

N2it)  =  NAt). 

Because  of  the  physical  relationship  between  rain- 
fall and  runoff,  A^2(0  is  dependent  upon  Ni{t)  and 
for  all  t  ^  0. 

N,(t)^N,{t).  (8) 
It  is  evident  from  inequality  8  that  for  every  i  >  j 


{Nr{t)=j}n{!\JAt)  =  i}  =  d,  (9) 

where  d  denotes  the  impossible  event. 

From  the  definition  of  the  counting  process,  it 
can  be  shown  that  for  i  =  1,  2,  3, 

P{Ti.^t}  =  f^P{Ni{t)  =  K}:u  =  0,l,2  .  .  .;f^0. 

Because  the  sequence  of  events  {.V,(/)  =  0}, 
{A',(0  =  1}.  •  ■  •  represents  a  countable  parti- 
tion of  the  sample  space  for  j  =  1.  2.  3  it  follows  that 

P{!\l,(t)  ^p}^  P{N,{t)  =  J^}  =  I;  P{^'x  (t)=n, 

N2{t)  =  i-}.  (10) 

Under  certain  assumptions,  the  rainfall  countinji 
process  can  be  described  by  the  time  d«>peMdent 
(nonhomt)jieneous)  Poisson  process  ( l  i )  as 

P{;V,(r)  =  «}=  A,UV  cxp  {-  yu)}  n\  ail 
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k{s)ds.  In  this  expression,  \(5)  is 


the  intensity  function  and  Ai{t)  is  the  expected 
value  function. 

By  utiHzing  equation  10  and  the  conditional 
probability 


-K\Ni{t)  =  n};  n=  K, 

K+1,  .  .  .;  K  =  0,  L  2 


rainfall  (assumed  to  be  an  instantaneous  input) 
and  is  depleted  by  evapotranspiration  and  deep 
seepage.  Runoff  will  occur  from  the  vth  rainfall 
event  if  X^^^  ^  m  — s{T[P) .  Between  events,  the 
depletion  of  storage  is  governed  by  an  equation  of 
the  form 


ds 
dt 


(13) 


the  runoff  and  sediment  yield  processes  can  be 
written  as 

=  f^i}jnit,v)P{N,{t)^n}.  (12) 

71=1' 

It  is  evident  that  the  nature  of  the  conditional 
probability  ^n{t,  k)  is  dependent  on  the  chance 
mechanism  which  selects  a  runoff  event  from  a 
rainfall  event.  Several  such  mechanisms  can  be 
postulated;  however,  as  their  structure  becomes 
more  reasonable  from  a  physical  standpoint  they 
become  mathematically  rather  intractable.  We  shall 
consider  three  models  describing  the  rainfall- 
runoff  process:  (1)  the  pure  threshold  model,  (2)  the 
general  threshold  or  storage  model,  and  (3)  the  in- 
filtration model.  Each  of  these  is  a  lumped  param- 
eter model  in  that  spatial  variations  will  not  be 
considered.  Furthermore,  they  are  strictly  limited 
to  the  prediction  of  rainfall  excess,  the  rate  of  free 
water  accumulation  on  the  soil  surface,  rather  than 
watershed  runoff.  For  very  small  watersheds,  these 
are  nearly  identical  but  for  large  watersheds  the 
attenuation  of  the  rainfall  excess  rate  by  watershed 
hydrauhcs  must  be  considered. 

1.  The  pure  threshold  model.  —  Assume  that  for 
each  rainfall  event,  a  certain  threshold  amount  of 
precipitation  m  must  be  exceeded  before  runoff 
occurs.  The  vth.  rainfall  event  will  be  a  runoff 
event  only  i{  X\}^>  m. 

2.  The  general  threshold  model.— The  action  of 
a  drainage  basin  in  transforming  rainfall  to  rainfall 
excess  is  considered  to  be  analogous  to  the  behavior 
of  a  single  reservoir  with  a  maximum  capacity,  m. 
At  any  moment,  the  quantity  of  water  in  the  soil 
moisture  reservoir  is  s{t),  and  m  — 5(0  is  the  soil 
moisture  deficit.  The  reservoir  is  replenished  by 


A  linear  form  of  this  equation  has  been  used  fre- 
quently by  hydrologists  to  estimate  soil  moisture 
depletion  by  evapotranspiration  and  deep  seepage. 

3.  The  infiltration  model.  — The  infiltration  ap- 
proach has  the  most  sound  physical  basis  of  the 
three  models.  The  principle  of  the  infiltration 
approach  is  simple:  Runoff  occurs  when  the  rain- 
fall intensity  exceeds  the  rate  of  infiltration.  For 
any  soil  with  a  specified  initial  water  content  and 
distribution,  the  maximum  rate  at  which  water 
can  be  absorbed  through  the  surface  is  known  as 
the  infiltration  capacity,  /.  If  water  is  ponded  at 
the  surface  at  time  t  =  0,  the  infiltration  capacity 
curve  typically  decreases  rapidly  with  time  and 
approaches  a  minimum  value,  fc.  The  variation 
of  infiltration  capacity  with  time  may  be  expressed 
by  the  following  empirical  equation  (5): 

f=fc  +  a{m-s(t))"  (14) 

where /c  is  the  minimum  infiltration  capacity;  a  and 
n  are  parameters  constant  for  a  given  soil-vegetation 
complex,  and  (m  — s(f))  is  the  remaining  volume 
of  potential  storage.  For  the  vlh  rainfall  event, 
the  actual  infiltration  rate  is  equal  to  the  rainfall 
rate,  ^i(i),  if  ^i(0  ^/ and  is  equal  to  /otherwise. 
It  is  necessary  to  know  the  initial  deficit,  m  — S„,  at 
the  beginning  of  the  event  and  to  have  a  method 
for  depleting  storage  by  evapotranspiration  and 
seepage  between  events.  The  model  given  by 
equation  13  could  also  be  used  for  depleting  storage 
in  this  model. 

For  a  simplified  explanation  of  the  rainfall-runoff 
process  using  the  infiltration  model,  suppose  that 
rainfall  occurs  at  a  constant  intensity  ft"  with 
duration  Z[,"  where  ^  and  Z  are  continuous  random 
variables.  A  rainfall  event  will  then  be  represented 
be  a  point  on  the  plane  (fig.  2).  The  intensity 

histogram  is  the  rectangular  pulse  oabc.  If  the  initial 
deficit,  m—Si,,  is  large,  the  infiltration  capacity 
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curve  during  the  rainfall  event  might  follow  the 
curve  de,  in  which  case  no  runoff  would  occur. 

If  the  initial  deficit  were  small,  the  infiltration 
capacity  curve  might  follow  curve  segment  fg,  and 
the  volume  of  runoff  would  equal  the  crosshatched 
area.  Clearly,  runoff  will  occur  if  > /  for 
(r<,>'-Zl")  r„".  The  infihration  model  can 

be  combined  with  a  threshold  model  to  account  for 
interception  — the  amount  of  precipitation  retained 
on  leaves  and  foliage. 

The  conditional  probability  i|/„(<,  i^)  for  the  pure 
threshold  model  is  easily  obtained.  Runoff  occurs 
only  if  A^'^"  >  m.  Therefore  the  probability  of  runoff 
occurring  for  the  Kth  rainfall  event  is 

7r=P{;^»">m}  (15) 

where  m  is  the  constant  threshold.  This  model  cati 
be  generaUzed  by  assuming  that  the  threshold  rti 


varies  from  one  event  to  another  in  a  random  manner 
independent  of  the  rainfall  process.  The  probability 
of  a  runoff  event  given  that  a  rainfall  event  has 
occurred  is  then 

Tr^\-       [' -"^  "'  f{X^^  \  m )  ft^<;>  dm  (16) 

where /(A^J,",  m)  is  the  joint  density  function  of  the 
random  variables  m,  A''^'\  If  the  X\^\  k=  1.  2  .  .  . 
are  independent,  identically  distributed  raiidtMU 
variables,  the  conditional  prt>bability  is 

<|/H(f.  »')  =  (^"]  77"  (1  -  tt)"  '.  [\:) 

The  runoff  counting  process  for  the  pure  threshold 
model  can  be  obtained  from  equations  11,  12.  and  17 
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P{N2{t)  -n}  =  exp{-Ai(0} 

I;  ('")7r"(l-77)«-A:(t)'Vn!  (18) 

n=i' 

Takacs  (11)  has  shown  that  equation  18  is  a  Poisson 
process  with  parameter  7rA,(f).  Therefore,  if  the 
pure  threshold  process  is  appropriate,  the  runoff 
counting  process  should  be  Poissonian  if  the  rain- 
fall process  is  Poissonian. 

The  time  varying  behavior  of  soil  moisture  con- 
tent in  the  general  threshold  model  is  analogous  to 
the  contents  of  a  finite  dam  with  stochastic  input 
and  continuous  outflow.  The  finite  dam  with  con- 
stant outflow  rate  has  been  studied  by  Takacs  (12). 
With  the  general  threshold  model  the  time  intervals 
between  runoff  events  form  the  sequence  Li, 
-  {r^>  -  r<-l'i}  1^=  1,  2  .  .  .To  obtain  the  runoff 
counting  process,  we  must  find  the  distribution  of 
L  where  L  is  the  recurrence  time 

L(m)  =  inf    -        5(f )  =  m  \  s(T\;'^  )  =  m}. 

This  problem  has  apparently  not  been  considered 
in  finite  dam  theory  so  the  solution  is  not  presently 
available. 

For  an  elementary  exposition  of  the  infiltration 
model,  suppose  that  the  rainfall  intensity  for  the  vih 
event  is  constant  for  that  event  and  is  given  by 
and  that  the  duration  of  the  I'th  event  is  Z*,"  where 
^' '  *  and  Z' "  are  jointly  distributed  random  variables. 
The  time  varying  behavior  of  the  soil  moisture  con- 
tent is  analogous  to  the  contents  of  a  finite  dam  with 
stochastic  input  and  continuous  outflow. 

To  each  initial  soil  moisture  deficit  m  — 5(0  there 
will  correspond  an  infiltration  capacity  envelope 
curve  f(m  —  s,  t) .  Therefore,  for  each  initial  condi- 
tion and  each  rainfall  rate  there  is  a  certain 
rainfall  duration,  t^,  that  must  be  exceeded  if  runoff 
is  to  occur.  (See  fig.  3.)  The  critical  rainfall  duration 
for  the  vth  event,  t^,  depends  upon  the  soil  moisture 
storage  at  the  time  of  the  I'th  event  and  upon  ^^p. 
Therefore  the  probability  of  runoff  occurring,  given 
that  a  rainfall  event  occurred,  is  equal  to  f  {Z'^'* 
>T^},  and  the  recurrence  time  between  events  is 
L(m,  s)  =inf{f:Z<3'  >  T,\s(n^L,=  m)} .  A  solution 
for  the  distribution  of  the  recurrence  time  L  for  the 
infiltration  model  is  even  more  difficult  than  for  the 
general  threshold  model. 


,  Envelope  Curve 
f  (  m-s,  t  ) 

\ 

Figure  3.  — Infiltration  capacity  envelope  curve. 

Because  analytic  solutions  for  the  conditional 
distribution  >|/;i(f,  v)  are  not  available  for  the 
general  threshold  model  or  for  the  infiltration  model, 
we  offer  the  hypothesis  that  the  following  chance 
mechanism  may  result  in  a  useful  description  of  the 
runoff  counting  process. 

Consider  a  particular  period  of  the  year  (May,  for 
example).  Each  year  A^i  rainfall  events  will  occur 
in  this  period  where  A^i  is  a  random  variable,  which 
we  will  assume  follows  a  Poisson  distribution.  A'^^ 
runoff  events  will  be  selected  from  the  Ni  rainfall 
events  according  to  a  binomial  process  with  param- 
eter 77.  Because  of  year-to-year  differences  in  soil 
moisture  storage  at  the  time  of  a  rainfall  event,  the 
parameter  tt  is  a  random  variable,  0  ^  tt  ^  1.  It  is 
obvious  that 

P{N.At)  =  v\Ni(t)  =  n,  Tr=P}  =  {^P'il -P)"-". 

(19) 

To  obtain  the  conditional  probability  ^n{t,  v),  we 
must  form  the  function  PiNiit)  ^  v,  tt  =^  P  \Ni(t) 
—  n}  and  integrate  it  with  respect  to  tt. 
Now  if  77  is  independent  of  A2  ( t ) , 

P{N2it)  ^v,7r  =  P  I  Nr(t)  =  n} 


PROCEEDINGS  OF  THE  SYMPOSIUM  ON  STATISTICAi,  HYDROLO(;Y 


301 


TT^d  -  7r)«-*'/(7r) 


where  /(tt)  is  the  density  function  of  vr.  The  condi- 
tional probability  4in(t.  f )  is  therefore 

li/„it,  P)  =  (^"^  j  TT-'il-  TT)"-"f(lT)dTT.  (20) 


Now  let  us  assume  that  the  distribution  of  tt 
between  years  is  given  by  the  beta  distribution 


(x\  /8! 


(21) 


Performing  the  integration  indicated  in  equation 
20,  we  obtain 


^n(t,  V)  = 


n-v  + 
n  —  V 


V  -\-  a 


K  n 


(22) 


This  distribution  was  originally  proposed  by  Skellam 
(9)  and  has  been  called  the  negative  hypergeometric 
distribution  (7). 

The  expression  for  the  runoff  and  sediment  yield 
counting  process  can  be  obtained  by  substituting 
equations  11  and  22  into  equation  12: 

P{NAt)  =  1^}  =  Pi^'AO  =  W  =  exp  {-  A,(0} 


/n-  p  +  (3 
{A,(r)}"  \  n-p 


i>  +  a 
a 


n 


(23) 


In  Appendix  A  it  is  shown  that  the  first  two 
moments  of  yV2(0  are: 


where  E{-}  denotes  the  mathematical  expectation 
and 


E{7r'}  = 


(a  +  /3+  1)!  (a  +  r)! 

(a  +  /3  +  r  +  1)!  a! 


It  is  apparent  that  the  chance  mechanism  which 
leads  to  equation  23  is  not  the  same  as  that  in  the 
general  threshold  model  or  the  infiltration  model. 
However,  an  examination  of  the  expression  for  the 
mean  reveals  that  the  parameters  a  and  /3  may  be 
related  to  the  soil  properties  of  the  watershed.  For 
example,  as  -\.  E{.\',(t}} ElWit)}  or  the 
watershed  is  impervious;  as  /S^^c,  ^{^^(f) }  ~*  0 
and  no  runoff  occurs. 

The  Sediment  Yield  Process 

The  sediment  yield  process  for  an  ephemeral 
stream  consists  of  a  series  of  events  each  of  which 
has  relatively  short  duration.  In  this  case,  the 
process  X:i(t)  defined  by  the  stochastic  integral  6 
may  be  approximated  by  the  following  sum 


A'3(f) 

^3(0  =     X  ^l'' 


(26) 


Actually  we  have  (see  fig.  1) 

X,{t)-X:,(t)  =  jl^,JAs)<ls  (r(3)=0). 

If  6(5)  =  0  for  all  5e(r3)  /],  it  follows  that 
Xi{t)  =  X:i{t) .  Since  the  events  have  relatively 
short  duration,  the  probability  of  having  €(s)  >0 
for  s€(T\^^^^|^.  t]  is  not  very  high,  so  that  Xa(t)  repre- 
sents a  fairly  good  approximation  of  A;i(r)  for  all  /. 
Writing 


1  +  .V.i(f) 


E{N-,{t)}- E (77)  E{\\{t)}  (24) 
E{N2{t)^}=E(Tr)  £{A^,(r)}+£{iV,(/)}2£{7rn 

(25) 


we  apparently  have 

The  reason  for  using  the  approximation  A;i(/)  of 
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the  process  X:i{t)  is  simple.  Methods  for  computa- 
tion of  the  distribution  function  of  the  stochastic 
integral  6  are  not  available,  whereas  it  is  very  easy 
to  determine  the  distribution  function  of  the  process 
Xsit). 

At  this  point,  it  is  necessary  to  make  some 
assumptions  regarding  the  sediment  yield  for  each 
runoff  event  X^^\  We  will  consider  three  cases: 
(1)  The  sediment  yield  per  event  is  an  independent, 
identically  distributed  random  variable  with  density 
function  f{x).  (2)  The  sediment  yield  per  event  is 
independent,  but  is  drawn  from  two  distributions. 
During  the  early  spring  season  before  vegetation  is 
weU  established  by  Fi{x)  and  during  the  remainder 
of  the  year  by  F-zix).  (3)  Sediment  yield  per  event 
is  correlated  with  hydrologic  characteristics  of 
the  runoff  event  or  with  the  rainfall  energy  per 
event.  Dragoun  (3)  for  example,  found  the  following 
regression  forms: 

X^3)  =  a  +  bE;  /?2  =  0.613 
^(3)        +  b{X(^^  +  qp);       =  0.811 

where  E  is  the  kinetic  energy  of  the  rainfall  event, 
Q  is  the  volume  of  storm  runoff,  is  the  peak 
rate  of  storm  runoff,  a  and  b  are  regression  coeffi- 
cients, and       is  the  coefficient  of  determination. 

Case  1.  If  the  sediment  yield  per  event  represents 
a  sequence  of  independent  identically  distributed 
random  variables  independent  of  N^it)  for  all  t, 
then: 

Fsix  I  t)=P{NAt)=0} 

+  i  P{NAt)  =  1^}  r{f  {u)Y*du  (27) 

v=l  •'o 

where  F-six  \  t)  =  P{?[3{t)  ^  x},  f  (x)  is  the 
density  function  of  sediment  yield  per  event  for 
x  >  and  V*  indicates  i'*th  convolution  of  / (x) 
with  itself.  The  mean  and  the  variance  are: 

E{X^{t)}  =  E{N,{t)}  E{Z<3)}  (28) 


variXsit)}  =  E{N,{t)}  variX^^ 

+  var{/V3(0}£H^Y<3)}.  (29) 

Thus,  the  mean  and  variance  of  the  total  sediment 
yield  in  the  time  interval  (0,  t]  can  be  estimated 
quite  readily  from  the  mean  and  the  variance  of 
the  runoff  counting  process  and  the  mean  and  vari- 
ance of  the  sediment  yield  per  event. 

Case  2.  Let  the  year  be  divided  into  two  seasons 
and  let  time  t  =  0  coincide  with  the  beginning  of  one 
of  the  seasons.  Consider  the  process  ^3(7")  defined 
only  for  integer  values  of  the  years  r=  1,  2,  3  .  .  . 

X,{t)  =  MT)+2Xs{t),  (30) 

where  the  leading  subscript  refers  to  the  season. 
Furthermore: 

If  the  seasonal  counting  processes  and  the  sediment 
yields  are  independent,  the  density  function  /  {X^^it)} 
can  be  obtained  by  convoluting  the  density  functions 
of  the  seasonal  yields,  and  the  mean  and  variance  of 
7^3(7")  is  the  sum  of  the  means  and  variances  of 
iXsiT)  and  2X3(7),  respectively. 

Case  3.  This  case  will  be  considered  briefly  in  this 
paper.  Suppose  that  sediment  yield  per  event  is 
related  to  peak  discharge  q^  by  a  regression  equa- 
tion of  the  form 

In  Z<3'  =  \n  a  +  bin  qp  +  e,  (31) 

where  the  error  e  is  normally  distributed  with  mean  0 
and  variance  cr^  and  is  independent  of  qp.  If,  as  is 
frequently  the  case,  we  have  relatively  short  con- 
current records  of  sediment  yield  and  runoff  but 
much  longer  records  of  runoff  alone,  we  might  obtain 
more  reliable  estimates  of  sediment  yield  by 
utilizing  the  relationship  of  equation  31  to  estimate 
the  distribution  of  sediment  yield / (x).  The  density 
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function  {fXnit)}  could  then  be  obtained  by  the 
methods  described  for  case  1. 

Example 

As  an  example  of  the  application  of  this  model 
to  a  real  situation,  let  us  consider  the  data  available 
from  two  small  watersheds  at  the  Central  Great 
Plains  experimental  watershed  near  Hastings, 
Nebr.  The  two  watersheds,  W-3  and  W-5,  have 
drainage  areas  of  481  acres  and  411  acres,  respec- 
tively, and  are  described  in  a  USDA  publication 
(14).  Both  watersheds  are  mixed  cover  agricultural 
areas  on  loessial  soils  of  predominantly  silt  loam 
texture.  Average  land  slopes  are  between  5  and  6 
percent.  Corn,  oats,  wheat,  and  sorgum  are  grown 
on  the  cultivated  land.  Before  1947,  both  water- 
sheds were  farmed  in  straight  rows  with  little 
regard  for  soil  conservation.  This  system  was 
continued  in  W-3,  but  conservation  practices  in- 
cluding terraces,  waterways,  contour  tillage,  and 
land  use  conversions  were  established  on  about  85 
percent  of  W-5. 

Precipitation  and  runoff  records  are  available  for 
both  watersheds  for  the  period  1939-63,  inclusive. 
Sediment  load  measurements  were  made  with  a 
hand  sampler  and  a  single-stage  sampler  and  are 
available  for  the  period  1957-63  inclusive.  The 
sediment  yield  data  are  shown  in  table  1. 

If  the  sediment   yield  of  the  two  watersheds 


Table  1 .  —  Sediment  yield  data 


Year 

W-3 

W-5 

1957  

Tons 
6.373 
1.220 
2.582 
7.419 
2.270 
2.289 
3,565 

Tons 
4,791 
961 
2,663 
2,758 
500 
150 
70 

1958  

1959  

1960  

1961  

1962  

1963  

3.674 
2,324 
879 

1,695 
1 .760 
665 

Standard  deviation  

before  treatment  was  identical,  these  data  indicate 
that  the  conservation  practices  reduced  annual 
sediment  yield  by  nearly  2,000  tons.  However,  the 
standard  deviations  of  the  means  are  large  because 
of  the  short  period  of  record.  This  raises  the 
following  question:  Can  we  obtain  better  estimates 
of  the  mean  sediment  yields  from  these  watersheds 
by  utilizing  the  stochastic  model  of  sediment  yield 
and  the  longer  records  of  runoff  and  precipitation? 

The  first  step  in  calculating  the  mean  and  variance 
of  the  annual  sediment  yield  is  to  calculate  parame- 
ters of  the  sediment  yield  counting  process  N^{t). 
The  sample  mean  number  of  rainfall  events  per  year 
Ai  is  64.69.  The  sample  statistics  for  the  runoff 
events  are  shown  in  table  2.  A  chi-square  test 
indicated  that  it  is  very  unlikely  that  the  runoff 
counting  processes  for  the  Hastings  Watersheds  are 
Poissonian;  consequently,  we  use  the  counting 
process  given  by  equation  23. 

The  parameters  a  and  /3  of  the  counting  process 
^'2(0  (equation  23)  were  calculated  by  the  method 
of  moments.  Cumulative  distributions  of  the 
number  of  events  per  year  are  shown  in  figure  4. 
Utilizing  equations  28  and  29.  we  find  the  computed 
mean  and  variance  of  the  annual  sediment  yield 
are  as  follows: 


Mean 

Variance 

Standard 
Deviation 

W-3  

Tons  per  year 
2.413 
1.070 

3.1BX  10« 
1.34  X  10« 

Tons  per  year 
1.783 
1.158 

W-5  

The  mean  and  variance  for  the  7  year  period  of 
record  are  shown  in  table  1. 

The  difference  between  the  mean  and  variance 
computed  by  equation  28  and  29  and  the  sample 
values  obtained  from  the  short  period  oi  record 
can  be  attributed  to  the  large  miinber  of  runoff 
events  that  occurred  during  the  period  1957-63. 
Part  of  this  difference  mav  be  due  to  obvious  in- 
consistencies in  the  definitions  of  runotl  evetits 
during  tabulation  of  the  data.  Therefore,  these 
estimates  should  hv  considered  as  merely  an 
example. 

The  above  calculations  were  carrieil  out  using  the 
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Figure  4.  — Cumulative  distributions  of  N 2(1)  for  1  year. 


sediment  yield  model  1  assuming  that  the  sediment 
yield  per  event  is  an  independent,  identically 
distributed  random  variable.  Although  it  is  beyond 
the  scope  of  this  paper  to  investigate  in  detail 
models  2  and  3,  we  show  an  example  of  the  correla- 
tion between  sediment  yield  per  event  and  the 
peak  discharge  in  the  scatter  diagram  of  figure  5. 
From  this  diagram,  it  appears  that  the  relation- 
ship given  by  equation  31  would  adequately  repre- 
sent the  data  for  this  watershed. 


Table  2.  — Sample  statistics  for  the  runoff  process 


Watershed 

Mean 

N2 

Variance 
Var  {N2) 

a 

P 

W-3  

10.0 
7.0 

26.0 
15.4 

4.14 
4.12 

27.7 
48.0 

W-5  

Summary  and  Discussion 

Hydrologjsts  and  engineers  are  frequently  called 
upon  to  estimate  the  accumulated  sediment  yield 
from  a  watershed  in  some  time  interval.  In  these 
applications,  the  variation  of  the  sediment  yield 
as  well  as  the  mean  value  may  be  an  important 
consideration. 

In  this  paper,  it  is  suggested  that  sediment  yield 
from  an  ephemeral  watershed  may  be  treated  as  the 
sum  of  a  random  number  of  random  variables.  In 
this  formulation,  the  distribution  of  the  sediment 
yield  per  event  must  be  obtained  empirically, 
but  the  sediment  yield  counting  process  can  be 
related  to  the  precipitation  counting  process  by  a 
rainfall-runoff  model. 

Three  progressively  more  complicated  rainfall- 
runoff  models  were  postulated:  (1)  The  pure  thres- 
hold model,  (2)  the  general  threshold  model,  and 
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(3)  the  infiltration  model.  Under  the  assumption 
that  the  rainfall  event  counting  process  is  Poisson- 
ian,  analytic  results  for  the  runoff  counting  process 
were  obtained  for  the  pure  threshold  model.  It  was 
hypothesized  that  a  counting  process  based  upon 
the  negative  hypergeometric  distribution  for  the 
conditional  probability  P{N2{t)  =  v  \  Nj{t)  =  n} 
would  have  parameters  that  could  be  related  to 
soil  and  rainfall  intensity  characteristics.  The  re- 
sulting runoff  counting  process  gave  a  good  fit  to 
data  from  two  small  watersheds  near  Hastings, 
Nebr. 

An  example  is  given  of  the  application  of  the 
stochastic  model  of  sediment  yield  to  the  problem 
of  estimating  mean  and  variance  of  annual  sediment 


yield  if  only  short  concurrent  records  of  sediment 
yield  and  runoff  are  available,  but  longer  records 
of  precipitation  and  runoff  c;an  be  utilized. 

The  models  presented  in  this  paper  must  be  con- 
sidered as  a  preliminary  approach  to  the  problem 
of  developing  an  analytically  tractable  stochastic 
description  of  the  sediment  yield  process.  Efforts 
should  continue  to  develop  analytic  solutions  for 
the  runoff  counting  process  utilizing  the  general 
threshold  and  infiltration  models.  The  counting 
process  based  upon  the  negative  hypergeometric 
distribution  appears  to  fit  observational  data  fairly 
well.  It  should  be  tested  more  extensively  to  observe 
the  variation  of  parameters  under  various  climatolog- 
ical  and  geological  conditions. 
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Sediment  yield  data  should  be  analyzed  to  deter- 
mine in  which  regions  the  assumption  that  yield 
per  event  is  an  independent  and  identically  dis- 
tributed random  variable  is  vahd. 
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Appendix  A 

Derivation  of  First  and  Second  Moments 
ofN,it) 

The  mean  of  the  runoff  counting  process  is  given 
by  (see  equation  23) 

E[N2{t)]=2  exp{-A.(0}A,(0" 

n=  1 

First  consider  the  case  when  i8=  1 
E\_N,  it)  ]  =  exp{-  A.  (^) }  ±  A,  (0 "  ^"^^^l^l^^]^ 


"   {n-v+l)iv  +  a)l 
Now  the  interior  sum  can  be  written  as 


(A-2) 


It  can  be  easily  verified  that 

2i;^=7d:iyn(-o  ,a-4, 

By  utilizing  identity  A-4  one  can  show  that 


'\  v{ir+a)\ 


L   (^-1)!     [  («  +  2)^(a  +  3) 


n 


(A-5) 
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By  substituting  the  identities  of  A-4  and  A-5  Consider  the  second  term  of  equation  A- 10: 
into  A-3 


a+I 

=  n  ("+^') 

i=0 


n  +  a  +  2 


.(a  +  2)(a  +  3). 


(A-6) 


When  this  expression  is  substituted  into  equation 
A— 2,  we  obtain  for  the  case  /3  —  1 


(a+1) 


(«  +  3)' 
which  is  of  the  form 

E[NAt)]-^E[N,{t)]-E[7T]. 


(A-7) 


(A-8) 


Since, 


r    .      {a  +  l3+l)\{a+r)l 
(a  +  )8  +  r+l)!aI 


We  have  proved  that  the  relation 

=         +  «  +       1)!(q  +  2)!/3! 
(n-l)!(a  +  )8  +  2)!(a  +  /3+2)! 


(A-9) 


holds  for  /3=1.  (See  equation  A-6.)  Suppose  that 
A-9  holds  for  /3  =  /3  and  let  us  prove  that  it  is  good 
for  j8+l  (mathematical  induction): 

^  (n  +  a  +  j8+2)!(a+l)!()3+l)! 
(n-l)!(a  +  )8  +  3)! 

The  above  equation  can  be  written  as: 


let  u  —  \—k,  n*  —  n  —  I,  and  a*  =  a  +  1 


Z  (ri-p)l{p-2)\ 


_  'U,'  (n- k  +  p-\)Uk  +  a+  1) 


{n- k-l)Hk-l)l 


^  ^  (n*-i^  +  /3)!(i/  +  a*)! 

=  (--f-)(..2):,i 

By  substituting  the  above  expression  in  equation 
A- 10  and  simplifying 

and  utilizing  the  identity 

(T)=(:--.vrr') 

we  can  write  equation  A"  11  as 

("+^)("^:T')-("^"-r')  '"+2' 

or 


=(''^:!f^')(«+i)!(/i+n!.  (A-,0.  (-.-n("+;;:f^') 
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which  proves  that  equation  A-9  is  true.  From  this 
follows  that  relation  A-8  holds  for  arbitrary  /3=1, 
2,  .  .  .. 

By  virtue  of  equation  A-9  one  can  easily  show  that 

.  .    ,      (a+l)(a  +  2) 
'  (a  +  /3+3)(a+/3  +  2)" 

(A- 13) 

which  is  of  the  form 

E[N,{tr]=E[N,{t)]-E[7T] 

+E^[N,{t)]-E['!T-'].  (A-14) 

Appendix  B 

List  of  Symbols 

E  kinetic  energy  of  rainfall  event 

/  infiltration  capacity 

fc  minimum  infiltration  capacity 

f(x)  density  function   of  sediment  yield 

per  event 
F,{X\t)=P{X3{t)^X} 

m  capacity    of    storage    reservoir  or 

threshold 

A'i(f)i=l,2,3  counting  process  for  rainfall,  runoff,  or 

sediment  yield  events 
L{m)  time   between   the   fth   and  t'+lst 

runoff  events 


Qp  peak  rate  of  storm  runoff 

s{t)  storage  in  reservoir 

t  time 

TJ\  r</'  time  of  ending  of  the  i^th  rainfall, 
runoff,  or  sediment  yield  event, 
respectively 

^i^*  volume  of  rainfall  per  unit  area  for  the 

v\h  event 

X^p  volume  of  runoff  for  the  v\\\  event 

X^^^  weight  of  sediment  transported  from 

the  basin  during  the  fth  event 
Xi{t)  the  accumulated  precipitation  in  (0,  t\ 

Xzit)  the  accumulated  runoff  in  (0,  t\ 

X^it)  the  accumulated  sediment  yield  in 

(0,  t] 

X^it)  lower  approximation  o{Xi{t) 

X^it)  upper  approximation  o{X:i{t) 

Z'^''  1=1,2,3  duration  of  the  v\\\  event 

oc  parameter 

/8  parameter 

6  a  member  of 

i{t)  the  stochastic  process  vector 

77  probability  of  runoff  occurring 

Ai(f)  the    expected    value   function  of  a 
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STATISTICAL  INFERENCE  ON  STREAMFLOW  PROCESSES 
WITH  MARKOVIAN  CHARACTERISTICS 


By  J .  L.  Denny,  C.  C.  Kisiel,  and  S.  J.  Yakowitz^ 


Abstract 

We  formulate  a  model  of  certain  streamflows 
in  terms  of  a  random  process  which  is  a  function  of 
a  Markov  process  with  stationary  transition  prob- 
abiUties.  We  approximate  the  above  process  by  a 
higher  order  Markov  process  with  stationary 
transition  probabilities.  Using  streamflow  data 
from  measuring  stations  in  Arizona,  we  apply  the 
model  to  study  questions  about  changes  in  the 
frequency  of  moderately  long  wet  and  dry  periods, 
prediction  of  streamflow  behavior  using  only  past 
streamflow  records,  long  range  trends  of  stream- 
flow,  and  other  problems.  We  raise  but  do  not 
satisfactorily  answer  some  questions  about  approxi- 
mating the  model  by  higher  order  Markov  processes. 


Introduction 

The  purpose  of  this  paper  is  threefold:  (1)  To 
describe  our  analysis  of  the  random  and  nonrandom 
hydrological  behavior  of  two  streams  in  southern 
Arizona;  (2)  to  present  the  hydrological  and  sta- 
tistical assumptions  in  a  sufficiently  broad  manner 
to  permit  their  use  elsewhere;  (3)  to  expose  issues 
important  to  imparting  a  physical  basis  to  stochastic 
models  in  hydrology. 

The  applications  of  the  analysis  fall  into  the  areas 
of  model  building  and  nonlinear  methods  of  pre- 
dicting certain  streamflow  measurements.  Since 
this  paper  contains  a  statistical  analysis  of  the 
hydrological  data  and  also  a  development  of  stream- 
flow  models,  the  relation  between  these  two  activ- 
ities needs  to  be  explained.  This  is  our  modus 
operandi:  before  any  model  building  is  done,  use 
hydrological  theory  and  facts  to  suggest  the  under- 
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lying  assumptions;  then  use,  and  develop,  if  neces- 
sary, statistical  theory  and  methods  to  ascertain 
the  validity  of  these  assumptions;  then,  build  the 
model  on  those  assumptions  which  are  supported 
by  analysis  (statistical  and  deterministic)  of  the 
data. 


General  Status  of  Stochastic  Models 

Many  of  the  premises  of  this  paper  were  moti- 
vated by  our  review  of  the  literature  on  models 
for  streamflow  synthesis  and  by  our  prior  experi- 
ence with  modeling  of  ephemeral  streamflows  (see 
in  particular  {13)  and  section  of  this  paper  on  the 
Hydrology  of  RiUito  and  Sabino  Creeks).  It  is  felt 
that  important  questions  remain  to  be  answered 
concerning  models  of  streamflow  synthesis  not 
only  for  aridlands  but  also  for  the  more  temperate, 
hydrologic  regimes.  Whether  or  not  an  all-encom- 
passing general  theory  of  stochastic  models  of 
hydrologic  processes  can  be  developed  remains  to 
be  shown.  To  some,  such  a  goal  seems  laudable  if 
only  to  satisfy  scientific  understanding.  Generally 
speaking,  most  hydrologists  take  an  operational 
view  and  ask  that  the  chosen  stochastic  model  be 
consistent  with  one's  objective  function.  Thus,  the 
modeler  must  decide  what  hydrologic  features  of 
observed  time  series  should  be  preserved:  short- 
term  memory,  long-term  memory  (self-similar 
m*)dels),  frequency  distribution  of  extreme  values 
(floods  and  droughts),  run  Icngtlis  and  run  sums 
for  high  and  low  i  rossiiigs  of  tiic  series  (including 
I'oissoii  and  non-Poisson  properties),  oscillatory 
properties  (including  random  phase  i>r  random 
amplitude),  and  even  nonstatioiiaritics  (nature  or 
man-induced).  The  fait  lliat  rclativcis  sliort  his 
torical  records  are  available  and  these  onK  at  a 
limited  number  of  sites  results  in  a  sitiKUit>n  wherein 
more  lliaii  one  model  ma\  preser\«-  the  stoiliastic 
and  dettM  Miinistic  properties  of  the  historical 
record.  Nonetheless,  these  empirical  results  suggest 
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hypotheses  for  stochastic  models  with  (hopefully) 
parameters  with  at  least  some  hydrologic  meaning. 
The  presumption  is  that  extreme,  runs,  crossing, 
and  dependency  properties  can  be  explicitly  pre- 
dicted for  each  model  and  judged  against 
observation. 

In  this  paper,  the  computational  focus  is  solely 
on  identifying  the  length  of  relevant  history  in  a 
sequence  of  wet  and  dry  days  (flow  or  no  flow)  and 
in  estimating  the  transition  probabilities  of  such 
sequences.  This  is  viewed  as  part  of  a  more  compre- 
hensive effort  to  model  the  random  time  of  occur- 
rence of  new  flow  events,  random  magnitude  of 
those  new  flow  events,  and  subsequent  evolution  of 
the  new  flow  events  to  "ground"  state  by  means  of 
deterministic  or  stochastic  functions  based  on  the 
hydrauHcs  of  flow  in  streams.  More  specifically, 
visual  inspection  of  actual  time  series  of  flow  on 
Sabino  Creek  leads  to  the  identification  of  the 
following  flow  properties  as  meriting  preservation  in 
any  stochastic  model  (26): 

(a)  For  a  large  proportion  of  days  the  flow  is 
zero. 

(b)  The  episodes  of  increasing  flow  tend  to 
cluster  together. 

(c)  The  rate  of  decreasing  flow  increases  faster 
than  a  direct  proportionality  to  the  in- 
creasing flow  amplitude. 

(d)  The  sequence  of  flows  is  not  stationary  but 
is  stochastically  periodic. 

For  the  RiUito  record,  only  replace  property  (c) 
with  — 

(c')  The  decay  of  flow  is  stochastically  de- 
pendent on  history,  such  that  the  model 
of  Sabino  flows  is  a  special  case.  Aspects  of 
properties  (c)  and  (c')  are  further  dis- 
cussed later  in  section  3.2  but  computa- 
tional results  are  deferred  for  a  later  paper. 

Statistical  Basis  to  the  Analysis 

The  statistical  ideas  in  this  paper  involve  some 
tests  of  hypotheses  and  estimation,  in  the  sense  of 
the  Neyman  theory,  about  Markovian  properties  of 
individual  states  of  certain  random  processes  as 
well  as  tests  of  hypotheses  and  estimation  about 
the  processes  themselves.  The  role  of  a  certain 
class  of  functions  in  hydrology,  known  as  com- 
pletely monotonic  functions  and  first  extensively 


studied  by  the  Russian  mathematician,  Bernstein 
(4),  is  discussed.  Both  statistical  and  deterministic 
viewpoints  are  employed. 

We  never  adopt  the  assumption  that  the  observ- 
ables  are  statistically  independent.  For  we  feel,  to 
paraphrase  Wiener  (25,  see  p.  1),  that  independent 
observations  occur  more  frequently  in  statistics 
than  in  nature.  Nevertheless,  even  when  there  is  a 
high  degree  of  dependence  in  the  data,  it  is  some- 
times possible  to  transform  the  data  with  small  loss 
of  information,  for  the  problem  at  hand,  so  that 
asymptotically  the  transformed  data  is  statistically 
independent.  We  take  advantage  of  this  possibihty. 

Hydrology  of  the  Rillito  and  Sabino 
Creeks 

Rillito  Creek  and  its  tributaries  (including  Sabino 
Creek)  drain  the  northern  and  eastern  parts  of  the 
Tucson  basin  and  the  nearby  Santa  Catalina, 
Tanque  Verde,  and  Rincon  Mountains.  Its  total 
drainage  area  above  the  confluence  with  the  Santa 
Cruz  River  is  918  square  miles,  and  its  average 
annual  runoff  volume  is  11,600  acre-feet  with 
standard  deviation  of  18,200  acre-feet  (9).  From  an 
elevation  of  2,300  feet  (with  respect  to  mean  sea 
level)  at  this  confluence,  the  Rillito  watershed 
extends  to  more  than  9,000  feet  above  m.s.l.  at 
summits  of  the  Santa  Catalina  and  Rincon  Moun- 
tains. About  220  square  miles  of  the  basin  is  more 
than  5,000  feet  above  m.s.l.  Sabino  Creek  near 
Tucson  drains  35.5  sq.  mi.,  eventually  contributes 
to  the  flow  of  the  Rillito,  and  has  an  average  runoff 
volume  of  about  950  acre-feet  (19). 

In  the  Tucson  basin,  there  exists  a  strong  relation 
between  precipitation  amounts  and  elevation  (23). 
At  altitudes  above  7,500  feet  in  the  Santa  Catafina 
and  Santa  Rita  Mountains  the  average  annual  pre- 
cipitation ranges  from  25  to  30  inches;  whereas  on 
the  valley  floor  near  Tucson  the  range  is  from  10  to 
12  inches.  Pertinent  to  stochastic  modeling  is  the 
distribution  of  these  amounts  during  the  year. 
During  the  summer,  airmass  convective  systems 
result  in  high-intensity  storms  of  short  duration 
and  small  areal  extent.  Winter  storms,  arising  from 
frontal  cyclonic  activity  have  a  greater  areal 
extent  and  are  less  intense  than  summer  storms. 
On  the  order  of  0.5  to  10  percent  of  the  precipita- 
tion in  the  basin  becomes  streamflow. 

Because   of  the   high-intensity  storms   in  the 
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July- September  period,  a  large  ratio  of  the  flood 
peaks  occur  at  this  time.  For  the  Santa  Cruz  River 
at  the  Cortaro  gaging  station  (downstream  of  Santa 
Cruz  — Rillito  confluence).  96  flood  peaks  of  more 
than  2,700  cubic  feet  per  second  (c.f.s.)  were  re- 
corded during  the  periods  of  record  (1939-47  and 
i     1950-64);  89  of  these  peaks  were  observed  in 
I    July- September  (8).  However,  over  much  smaller 
«    areas  of  the  basin,  the  highest  flood  peaks  on  record 
do  not  necessarily  occur  during  the  summer  even 
though  a  greater  percentage  of  the  population  feels 
the  effects  of  floods  in  that  period  of  the  year.  For 
!    example,  the  Rillito  experienced  a  24,000  c.f.s. 
I    flood    on    September   23,    1929    (highest  during 
!     1908-65)  and  the  Sabino  (near  Tucson)  had  a 
:    5,110  c.f.s.  flood  on  March  23,  1954  (highest  during 
I     1932-65).  Predictions  of  flood  from  the  available 
data   is   clouded   by   our  inadequate  knowledge 
about  the  effects  of  the  sand-channel  character  of 
the  Rillito  and  other  streams,  the  areal  extent  and 
meteorological  character  of  the  storms,  urbaniza- 
tion and  ground-water  pumpage  in  the  period  from 
1940  to  the  present,  and  snowmelt  from  the  moun- 
tains in  the  area.  In  general,  quantification  of  these 
effects  is  a  challenge. 

Flood  flows  in  the  Rillito  are  considerably  affected 
by  losses  through  the  sand  channel.  Some  of  this 
lost  water  becomes  natural  recharge  to  the  uncon- 
fined  aquifer  of  the  Tucson  basin.  Since  about  the 
mid-1940's,  pumpage  for  water  supply  to  irrigation, 
municipal,  and  industrial  users  has  resulted  in  an 
uncoupling  of  the  water  table  and  stream  channels 
over  virtually  all  of  the  basin. 

Schwalen  (20)  reports  from  an  analysis  of  well 
data  in  the  period  1959-60  along  the  RilHto  that  the 
most  important  source  of  surface  runoff  into  the 
recharge  area  was  from  snowmelt  and  winter  and 
spring  rains  from  that  portion  of  the  drainage  area 
at  higher  elevations.  However,  estimates  of  recharge 
range  between  20  to  80  percent  of  the  recorded 
streamflow.  Such  discrepancies  are  bound  to  arise 
when  studying  indeterminate  systems  as  exist  in 
hydrology.  Channel  losses  are  judged  to  be  quite 
important  when  constructing  stochastic  models  of 
ephemeral  flows;  flood  peaks,  hydrograph  reces- 
sions, time  to  peak  flow,  flood  volume,  and  total 
duration  of  flow  are  bound  to  be  influenced. 

Given  the  multiplicity  of  causal  factors  that  enter 
the  prediction  of  flow  volume  and  peak  flow,  one 
of  the  simplest  approaches  to  constructing  a  simula- 


tion model  of  ephemeral  streamflow  is  to  use  regres- 
sion procedures.  Baran  et  al  (3)  find  strong  statis- 
tical correlation  in  a  multiplicative  regression  model 
of  Rillito  daily  flows  (for  the  period  1933-65);  in 
this  model,  total  amount  of  flow  (in  acre-feet) 
within  one  flow  period  is  strongly  related  to  the 
duration  of  the  flow  period  and  the  peak  intensity 
of  flow.  The  dry  period,  antecedent  to  the  flow,  is 
found  to  contribute  very  insignificantly  to  prediction 
of  total  flow.  These  results  are  not  too  surprising 
to  those  hydrologists  famihar  with  shapes  of  hydro- 
graphs,  as  observed  in  aridlands,  which  is  of  the 
form  of  curvilinear  triangle.  The  area  (flow  volume) 
of  this  triangle  is  proportional  to  the  product  of  the 
base  (duration  of  flow)  and  height  (peak  discharge). 

Generally  speaking,  to  date  analysis  of  extreme- 
value  properties  of  precipitation  and  streamflow 
data  proceed  in  an  empirical  manner.  For  example, 
Davis  (10)  finds  that  a  log-normal  distribution  gives 
a  satisfactory  description  of  annual  flood  peaks  for 
the  RiHito;  other  distributions  like  the  Gumbel  are 
not  rejected  but  the  Kolomogorov-Smirnov  test 
strongly  indicates  the  statistical  acceptability  of 
the  log-normal  for  prediction  of  flood  frequency. 
Whether  such  a  distribution  is  adequate  from  a 
decision  standpoint  remains  to  be  evaluated  (10). 
The  Soil  Conservation  Service  {21 )  uses  in  an 
empirical  manner  the  two-parameter  gamma  distri- 
bution, as  judged  from  log-normal  plots  of  maximum 
annual  volume  of  flow  for  10  flow  durations  ranging 
from  1  through  274  days  to  develop  design  charts 
that  relate  exceedance  probability  to  volume  and 
duration;  results  are  given  for  the  Rillito  and  Sabino 
gage  sites. 

Basically,  the  above  empirical  approach  is  not 
the  result  of  finding  the  maximum  of  an  underlying 
stochastic  process,  like  the  Poisson  (22).  Markov, 
self-similar  (16),  stationary  Gaussian  (Ditlevsen, 
personal  communication),  and  so  on. 

Most  stochastic  models  of  streamflow  are  based 
on  annual  or  monthly  values  (average,  integrals,  or 
maxima  over  the  time  period).  A  frequent  question 
concerns  the  resulting  form  of  these  models  if  the 
modeling  eflort  were  initially  undertaken  with  the 
detailed  hydrograph  in  a  niiinitc,  hourly,  or  daily 
format.  A  v>i"cniise  of  tiiis  paper  and  others  (13.  2t^) 
is  that  in  ephemeral  streams  modeling  of  diflerenl 
properties  of  sequences  of  these  "actual"  hydro- 
graphs  is  an  essential  starting  point  to  a  better 
understanding  of  the  information  loss  in  grosser 
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stochastic  models  for  longer  time  periods  (months 
or  years).  For  example,  Kisiel  et  al.  (13)  showed 
that,  over  the  period  of  record  of  monthly  Rillito 
flows,  in  53  percent  of  the  months  zero  flow  was  ob- 
served; the  percentage  is  larger  for  daily  flows 
because  in  many  of  the  months  the  flows  occur  over 
only  a  few  days.  The  range  is  from  93  percent  of 
the  time  for  no  flow  in  May  to  7  percent  of  the  time 
for  no  flow  in  August.  By  analyzing  sequences  of 
hydrographs,  rather  than  lumped  or  smoothed 
flows  over  months  or  years,  it  is  felt  that  a  sounder 
physical  basis  can  be  imparted  to  stochastic  models. 
Thus,  from  daily  flows  Baran  et  al.  (3)  find  that 
there  are  at  least  five  distinct  time  periods  during 
the  year  when  the  RilHto  flow  arrivals  differ  sig- 
nificantly; furthermore,  a  Poisson  flow  arrival  rate 
in  summer  explains  why  the  antecedent  dry  period 
is  not  important  in  the  determination  of  flow  volume 
by  means  of  the  multiplicative  regression  model. 
Only  during  the  winter  is  the  Poisson  flow  arrival 
pattern  rejected  as  a  hypothesis  by  the  Kolmogorov- 
Smirnov  test.  One  tentative  conclusion  is  that  flow 
events  (wet  periods)  occur  in  an  independent  man- 
ner in  the  summer  and  are  not  clustered  as  in 
winter  (13). 

This  suggests  that  the  effect  of  channel  moisture 
is  not  very  different  from  one  flow  event  to  another 
in  the  summer  in  contrast  to  the  winter. 

Very  Httle  is  known  at  present  about  the  quanti- 
tative effects  of  urbanization  on  streamflow  in  the 
Tucson  basin.  A  priori,  one  may  guess  that  Rillito 
flows  would  be  affected  by  urban  development  to 
a  much  greater  extent  than  Sabino  flows  because 
the  latter  results  from  a  relatively  undeveloped 
mountain  watershed.  Preliminary  results  of  a 
current  on-going  urban  runoff  study  in  Tucson 
indicate  that  water  yield  from  three  small  urbanized 
watersheds  may  be  at  least  two  to  three  times  as 
great  as  that  of  the  desert  areas  (18).  Not  possible 
as  yet  are  inferences  about  effects  on  peak  dis- 
charge and  time  to  peak. 

It  is  estimated  that  about  1,100  acre-feet  of  runoff 
are  available  annually  in  the  urban  and  southeastern 
suburban  Tucson  area.  The  above  results  are  ob- 
tained by  comparison  of  urban  runoff  with  annual 
storm  runoff  on  the  undeveloped  desert.  Atterbury 
runoff"  is  about  2i  percent  of  the  rainfall.  Many 
assumptions  of  homogeneity  are  necessary  to  make 
such  comparisons,  but,  other  than  acquiring  more 


data,  the  only  other  alternative  is  a  theoretical 
analysis  of  the  urban  rainfall-runoff  process. 

Formulation  of  the  Assumptions 

This  section  describes  the  assumptions,  and 
their  consequences,  which  are  used  in  this  paper. 
The  theory  is  developed  and  described  and  is  partly 
applied  to  the  streamflow  measurements  from  the 
Sabino  Creek  in  the  next  section.  The  analyses  on 
RiUito  Creek  will  be  presented  in  another  paper. 
While  the  bulk  of  this  section  pertains  to  statistical- 
hydrological  ideas,  one  part  contains  a  discussion  of 
the  role  of  completely  monotonic  functions  in 
deterministic  hydrology,  but  even  the  deterministic 
theory  has  a  natural  statistical  interpretation  and 
extension. 

Consider  a  random  process  {X{t)  :  teT},  where  T 
is  a  subset  of  R,  the  real  numbers,  and  X(t)  is  a 
random  variable,  or  possibly  a  random  vector.  Of 
course,  T  is  the  time  domain  of  interest  and  will 
typically  be  the  integers,  the  positive  integers,  and 
R  itself.  The  sample  paths  of  {X{t)},  functions  of  t, 
are  identified  with  the  streamflow  measurements. 
Thus,  the  actual  set  of  numerical  data  taken  from 
the  stream's  measuring  station,  (xi,  .  .  .,  x«)i  is 
in  turn  identified  with  a  sample  path  of  a  random 
process  {Xi:i=l,  .  .  .,  n}.  That  is,  the  time 
domain  corresponding  to  the  random  process 
describing  the  actual  measurements  is  the  first 
n  integers. 

Basically,  the  random  processes  studied  in  this 
paper  are  higher  order  Markov  processes  with 
stationary  transition  probabilities.  As  again  pointed 
out  below,  stationarity  of  the  transition  probabihties 
does  not  imply  stationarity  of  the  process.  Although 
the  measurements  will  be  a  finite  set,  it  is  con- 
venient, for  developing  a  theory,  to  formulate  the 
assumptions  when  T  is  possibly  infinite.  Consider 
the  following  pair  of  conditions,  the  first  being  a 
higher-order  Markov  condition  and  the  second  being 
a  stationarity  of  transition  probabilities  condition: 

(1)  PiXit  +  s)^x\X{T):T^t) 

=  P{X{t  +  s)  ^x\X{T):t-k^T^  t) 

(2)  -      P{X{t  +  s)^x\X{T):T^t) 

=  P{X{t^s  +  r)^x\X{T):T^t^r) 
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for  allxeR,  teT,  t  +  seT  with  s  >  0,  t  +  5  +  reT  with 
reR,  and  s  >  0,  and  for  a  fixed  nonnegative  number 
k  which  does  not  depend  on  x,  t,  s,  or  r.  Note  that 
for  A-  =  0  the  classical  Markovian  property  arises. 

The  number  k  can  be  thought  of  as  the  relevant 
history  of  the  process  for  all  purposes  of  prediction 
if  the  random  behavior  of  the  process  is  known  and 
will  be  a  subject  of  statistical  investigation  in  this 
paper.  The  relevant  history  includes  new  flow 
disturbances  and  recession  of  older  flow  disturb- 
ances. The  above  pair  of  conditions  can  be  re- 
written as  one  condition,  called  condition  M: 

P{X{t-¥s)^x\X{T):T^t) 

=  P{X{t  +  s)  ^x\X{T):t  +  r-k^T^t  +  r) 

where  x,  t,  s,  r,  and  A  are  as  above.  Condition  M 
can  be  described  as  follows:  If  the  actual  chance 
behavior  of  the  streamflow  measurements  is  known, 
then,  for  purposes  of  predicting  future  streamflow 
measurements,  only  the  most  recent  time  segment 
of  length  A  is  needed.  Of  course,  the  actual  chance 
behavior  is  not  known.  Because  of  this  fact  and 
because  the  streams  studied  here  are  described  by 
processes  which  satisfy  condition  M  and  some  addi- 
tional properties,  condition  M  deserves  more 
clarification. 

Aridland  streams  and  other  streams  behave 
differently  at  various  times  of  the  year;  however, 
certain  "seasons"  appear  in  the  annual  streamflow 
records,  although  the  seasons  are  often  ill-defined 
and  do  not  necessarily  have  a  strict  periodic  struc- 
ture. It  seems  that  the  crucial  statistical  property 
a  season  should  possess  is  a  sort  of  homogeneity  of 
behavior  over  a  sequence  of  years.  Often,  the  notion 
of  homogeneity  is  described  mathematically  as 
second-order  stationarity  or.  perhaps,  strict 
stationarity. 

During  certain  seasons,  such  as  the  early  summer 
dry  season  in  southern  Arizona,  the  assumption  of 
homogeneity  is  reasonable  for  both  meteorological 
and  hydrological  activity.  However,  even  a  strict 
stationarity  in  the  random  behavior  of  meteorolog- 
ical phenomena  does  not  necessarily  imply,  it  seems 
to  us,  a  stationarity  of  the  hydrological  phenomena. 
For  example,  consider  a  season  of  precipitation 
and  assume  the  rainfall  is  well  described  by  a  sta- 


tionary model.  Now  look  at  the  watershed  receiving 
the  rainfall  and  a  stream  lying  in  the  watershed. 
The  initial  effects  of  precipitation  certainly  include 
increased  streamflow  activity.  Then  as  the  season 
of  rainfaU  grows,  the  manner  in  which  the  water- 
shed distributes  water  to  streams,  the  absorption 
capacity  of  the  ground  around  the  streams,  and  the 
ability  of  a  stream  to  transmit  increased  flow  are 
quantities  which  are  changing.  Thus  it  may  well  be 
unrealistic  to  postulate  that  the  behavior  of  a  stream 
is  stationary  when  regarded  as  a  random  process, 
even  with  statistically  stationary  weather. 

On  the  other  hand,  the  future  activity  of  a  stream 
during  a  season  of  homogeneous  precipitation  does 
depend  in  a  fairly  consistent  way  on  the  past 
behavior  during  all  or  part  of  the  rainy  season. 

As  an  example,  consider  a  90-day  season  of  homo- 
geneous precipitation.  Let  (xi.  .  .  ..  x-zo)  be  20 
consecutive  daily  readings  which  may  appear  during 

the  season.  Now  suppose  {xi  jr2o)  /5  observed 

on  the  11th  through  30th  days.  How  will  the  stream 
behave,  in  the  statistical  sense,  on  the  following 

31st  day?  Next, suppose  that  (xi  x-n))  is  instead 

actually  observed  on  the  41st  through  60th  days. 
Again,  how  will  the  stream  behave,  in  the  statistical 
sense,  on  the  following  61st  day?  In  some  situations, 
the  chance  behavior  of  the  stream,  still  assuming 
stationarity  of  precipitation,  will  be  the  same  on  the 
31st  and  61st  days. 

It  is  streams  with  this  behavior  which  are  studied 
in  this  paper.  Condition  M  is  almost  the  abstract 
formulation  of  the  above  except  for  the  choice  ot 
the  integer  20.  Twenty  days  of  streamflow  readings 
may  be  inadequate  or  may  be  excessive  for  purposes 
of  prediction,  but  the  existence  of  some  integer  A. 
not  necessarily  known,  so  that  condition  (2)  is 
satisfied  is  postulated.  To  understand  this  require- 
ment, consider  a  long  sequence  of  daily  streamflow 
readings,  during  the  90-day  rainy  season,  involving 
many  years.  Suppose  that  measurements  over  this 
90-day  period  are  given  for  the  preceding  100  years. 
In  a  rough  sense,  condition  M  asserts  that  using 
the    information    in    the    preceding    100  years 

(.V-!»(HM)  x  I )  and  the  most  recent  A  observations 

(vH-k+i  v„),  one  can  predict  \„^\  as  well  as 

if  one  uses  all  the  data  (.v-rnxx*  »  1.  x...  Ai. 

.  .  ..  .v„-fc+i  .v„)  from  the  past  to  the  present. 
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Deterministic    Aspects   of  the  Flow 
Period 

Thus  far  the  discussion  has  not  explicitly  touched 
upon  purely  hydrological  questions,  at  least  those 
independent  of  statistical  aspects,  but  the  general 
problem  at  hand  requires  at  least  a  discussion  of 
the  behavior  of  streamflow  measurements  when  no 
random  components  are  obviously  present.  Thus, 
ignoring  errors  of  measurement,  the  recession  flow 
of  the  hydrograph  in  the  absence  of  random  dis- 
turbances deserves  to  be  studied. 

The  problem  at  hand  requires  that  there  be  an 
understanding  and  also  rudiments  of  a  theory  of 
streamflow  after  all  discharges  into  the  streams 
have  ceased  and  streamflow  is  receding.  Even 
here  a  statistical  generalization  of  the  classical 
deterministic  hydraulics  naturally  arises. 

Let  us  briefly  look  at  previous  work  on  ground- 
water flow.  In  his  paper  about  baseflow  as  a  hydro- 
graph  component,  Appleby  (2)  mentions  several 
previous  endeavors  to  find  equations  which  ade- 
quately describe  groundwater  flow.  In  1906,  Maillet 
(15)  arrived  at  the  following  formula  for  recession 
prediction: 

g(i)  =  9o+<7i/(l  +  aO", 

where  ^o,  Qi,  and  a  are  positive  constants.  This 
paper,  as  Appleby  notes,  pertains  to  the  study  of 
the  flow  from  cascade  of  reservoirs.  In  1904, 
Boussinesq  wrote  his  paper  on  the  motion  of  water 
through  the  ground.  An  asymptotic  expression  of 
the  following  form  was  obtained: 

where  the  W{x)  is  a  space  variable.  In  another 
paper,  Appleby  (7)  found  that  a  diff'usion-type 
formula: 

q{t)=qoe-J"IT''l^ 

formed  a  respectable  model  in  the  middle  range  of 
groundwater  recession. 

In  passing,  it  is  of  interest  to  note  that  Yevjevich 
(27)  used  exponential-type  recession  functions  of 
the  same  family  as  above  to  estimate  water  carry- 
over from  one  year  to  another  and  to  construct 
first-order  autoregressive  models. 


We  are  interested  in  such  functions  mentioned 
above  to  describe  streamflow  recession  in  the 
ephemeral  streams  under  study,  after  a  discharge 
of  water  into  the  streams.  What  is  meant  by  "such 
functions?" 

To  answer  this  question  consider  a  situation 
which  may  be  hypothetical.  At  each  time  ^  >  0,  the 
flow  f{t)  of  a  stream  at  a  fixed  point  is  recorded. 
Assume  that  f{t)  =0,  0  <  t  <  ti,  but  that  at  time  ti 
there  is  a  discharge  into  the  stream  which  reaches 
the  point  at  time  t'z>tu  Thus  f{t)  =  0,  0  <  t  <  t-z. 
Assume  also  that  the  discharge  enters  the  stream 
rather  rapidly.  What  will  the  recession  curve  f{t) 
do?  First,  f{t)  wiU  probably  move  upward  in  an 
irregular  fashion  and  reach  a  maximum  or  point  of 
peak  flow.  Let  >  t-z  be  the  moment  of  peak 
flow.  Our  concern  in  this  paper  is  the  course  of 
fit),  t  ^  t3.  Since  there  is  assumed  no  further  dis- 
charge and  since  peak  flow  has  been  reached  we 
expect,  subject  to  random  disturbances,  that 
f{t)^f{h),  if  t  ^  ts.  Regarding  the  observable 
recession  curve  as  a  random  process  this  becomes 
the  expectation  Ef{t)  ^Efits).  t^ts.  (Even  if  f{t) 
is  purely  deterministic  we  clearly  can  write  Ef{t).) 
Let  g{t)=Ef{t),  and  equal  to 

the  ith  derivative  of  g,  assumed  to  exist.  Assuming 
that  the  discharge  spans  a  brief  time  period,  it  then 
seems  plausible  that  g^'\t),  the  rate  of  change  of 
expected  flow,  wiU  decrease.  That  is,  for  i  =  0, 
1,  2,  .  .  .,  we  have  (- 1 ) ''^"(O  >  0.  While  no 
physical  interpretation  is  possible  for  g'-*\t),  for  all 
i,  g^'Ht)  is  a  "rate  of  change  of  a  rate  of  change." 
It  is  plausible  that  as  time  t  increases,  the  rate  of 
g'-'Ht)  is  not  increasing,  i  =  0,  1,  2,  .  .  .  .  This 
condition  becomes: 

(-1)'^(')(0  ^0 

for  i  >  fs,  the  point  of  peak  flow,  all  i  =  0, 1, 2,  .  .  .  . 
Such  functions  satisfying  these  inequalities  are 
said  to  be  completely  monotonic  on  (fa,  =0).  Are 
completely  monotonic  functions  actually  relevant 
to  hydrology?  The  answer  is  yes,  since  all  of  the 
functions  described  in  the  above  paragraph  are 
completely  monotonic  on  (0,  0=).  Thus,  completely 
monotonic  functions  have  in  fact  been  used  by 
hydrologists  in  deterministic  situations  since  the 
beginning  of  the  century. 
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We  prefer  to  think  of  completely  monotonic 

n 

functions  such  as  ^  bte'"'',  where  a,  a,,  and 

bi  are  positive,  not  as  describing  the  actual  observed 
recession  curve  but  rather  as  describing  the  ex- 
pected value  of  the  recession  curve. 

Two  additional  comments  will  be  made.  First, 
the  theorem  of  Bernstein  (for  example,  see  (17). 
p.  11,  or  (24).  p.  160)  ensures  that  on  each  bounded 
interval,  each  completely  monotonic  function  g{t) 
can  be  approximated  uniformly  by  weighted  sums 
of  exponentials.  More  precisely:  For  each  interval 
Q  <  t  <  T  <  X  and  each  e  >  0  there  are  positive 
numbers,   a,,   and   6,,   i=l,   .  .  .,  n  such  that 

g{t)-^  bie-">'  <  6  for  aU  0  <  f  <  r.  That  is,  for 

1 

purposes  of  approximation,  we  can  use  functions 

n 

of  the  form  ^  for  suitable  Oj  and  bi  de- 

pending  on  T  as  well  as  g. 

The  second  comment  is  an  admission  of  igno- 
rance. We  do  not  know  what  it  means  for  a  random 
process  to  satisfy  £"(.¥(0*'*)  =  )  )*"  where 

the  first  mentioned  derivative  may  be  in  the  sense 
of  generalized  functions  and  the  latter  derivative 
is.  of  course,  the  condition  of  complete  monotonic- 
ity.  Rephrasing  the  sentence  in  a  question:  Which 
generalized  random  processes  {X{t)}  satisfy 
A'(.Y(0''')  =  for  j  =  0,  1,  2,  .  .  .?  This 

set  of  equations  arises,  of  course,  from  interchang- 
ing   integration    and    differentiation  in 
=  £/(0)'''  to  obtain  The  answer  to 

these  questions  would  provide  all  generalized 
random  processes,  which  on  the  average  behave  as 
a  completely  monotonic  function.  The  answer  would 
tiive  us.  for  example,  random  models  of  a  recession 

if  we  set  g(t)a  — .  To  specialize  to  the  deterministic 

interpretation  as  used,  for  example,  by  Maillet 
it  is  only  necessary  to  regard  the  recession 
(  urve  as  nonrandom:  I*(f(t)  =Ef(t),  for  all     =  1. 

The  recession  curve  of  streamHow  after  a  new 
discharge  plays  a  role  in  the  theory  and  in  some  of 
the  streamflow  modeling.  Statistical  questions 
about  the  recession  curve  are  not  treated  in  this 
paper,  but  a  statistical  study  of  the  hydrological 
parameters  of  streamflow  will  clearly  be  usclul. 


The  methods  in  this  paper  are  to  be  used  by  the 
authors  for  such  a  study  in  a  subsequent  paper. 

As  a  final  comment,  note  that  the  notion  of 
"memory  of  a  hydrograph"  is  by  no  means  a  new 
one.  The  autocorrelation  function  in  second-order 
stationary  processes  is  a  natural  paradigm  of  a 
hydrograph's  memory.  The  Uterature  on  estimating 
the  autocorrelation  function  is,  of  course,  extensive. 
Another  possible  approach  to  memory  can  be  found 
in  the  work  of  Mandelbrot  and  WaUis  (76).  Spe- 
cifically, the  Hurst  range  has  been  suggested  as  a 
tool  for  studying  what  is  intuitively  often  described 
as  "persistence."  More  precisely,  if  R{n)  is  the 
Hurst  range  of  the  data,  then  asymptotically  Rin) 
equals  (in"  for  positive  constants  /3  and  a,  and 
0  ^  a  ^  1.  If  the  data  is  independent,  no  memory, 
then  Feller  (11)  has  shown  that  a  —  1/2.  The  cele- 
brated data  of  Hurst  (1951)  gives  a  greater  than  1/2 
and  this  obviously  implies  some  sort  of  persistence 
or  memory. 


The   Statistical    Experiment   on  the 
Hydrograph 

The  preceding  remarks  are  perhaps  too  generaL 
To  offset  this  possibility,  we  now  formulate  the 
statistical  experiment  used  in  this  paper. 

The  time  domain,  T,  of  interest  is  the  integers 
{.  .  —  .  .,  —  1,  0,  1,  .  .  ..  n.  .  .  .}.  Associated 
with  each  integer  n  is  a  random  variable  .V„.  The 
parameter  space.  0,  of  this  experiment  is  the  set 

(^•.  Pi,  i^.^,}):  A  =  1.2,  .  .  .},  where  {pi,  i^.^^} 

are  stationary  transition  probabiUties  for  a  Markov 
chain  of  order  A.  Moreover,  the  transition  prob- 
abilities relate  to  a  Markov  chain  taking  only  the 
integer  values  0  and  1.  Thus,  it  is  assumed  that  for 
some  fixed  integer  A  and  set  of  transition 
probabilities: 

{Pi ,  ,  )  •  Pt*(^«  =  "h|-Vh-1  =  «„-!.  .V„_; 

=  (J„-2,  .  .  .)  =  P(»(A'„  =  H„|A'„_i  =  «„-!.  .\,,_2 
—  On-2  \„-A.  =  «„_it)  -  Pa„_^  "k-i-  "n 

where  each  —  1  or  0.  that  is.  wtM  or  iIfn.  Hn  defini- 
tion 6={k.   ifc,,}}. 

Thus,  the  parameter  set.  the  probabilities  asso- 
ciated with  the  parameter  set.  and  the  space  on 
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which  the  probabilities  are  defined  (namely,  the 
infinite  sequences  of  ones  and  zeros)  are  all  defined. 
This  is,  by  definition,  the  statistical  experiment  on 
which  statistical  inference  is  performed  in  this 
paper. 

The  method  used  for  inference  about  Pij,...,;^^,, 
for  known  k,  are  those  to  be  found  in  Billingsley  (5, 6), 
for  example.  The  methods  used  for  inference  about 
the  parameter  k  are  natural  ones  used  possibly  by 
others.  However,  no  actual  references  are  known. 
Actually,  it  is  necessary  to  generalize  the  above 
method  for  our  applications. 

Statistical  Inference  on  the  Data  From 
the  Sabino  Creek 

The  purpose  of  this  section  is  to  give  the  con- 
clusions of  the  statistical  analysis  of  the  wet-dry 
readings  from  Sabino  Creek  along  with  a  discussion 
of  the  statistical  methods  used. 

The  basic  questions  studied  are  (1)  the  question 
of  stationarity  of  the  transition  probabilities  over 
different  decades  and  (2)  the  question  of  finding  the 
constant  A:,  the  order  of  the  Markov  chain.  The 
underlying  assumptions  lead  to  statistical  tests, 
which  test  simultaneously  both  stationarity  and  the 
relevant  history. 

Clearly,  the  random  behavior  of  transitions  from 
wet  to  dry  will  depend  on  the  "season"  or  period  of 
time  studied.  Therefore,  this  paper  looks  at  station- 
arity and  memory  questions  within  a  fixed  season. 
The  definition  of  season  will  be  seen  below  to  be 
arbitrary,  and,  consequently,  the  criteria  for  defining 
seasons  is  also  tested  statistically. 

The  data  from  the  Rillito  Creek  extend  over  10 
years  more  than  does  the  Sabino  data,  and  more 
comparisons  are  made  for  the  Rillito.  In  this  paper, 
we  consider  the  Sabino  Creek  only. 

During  the  winter  rainy  season,  which  typically 
begins  during  the  middle  of  December  and  persists 
through  March,  it  appears  plausible  to  consider 
the  possibility  of  stationarity  of  transitions  and 
finite  history.  Since  the  initial  and  terminal  points 
of  the  rainy  seasons  are  not  well  defined,  attention 
is  directed  to  a  subinterval  within  the  season.  We 
study  the  behavior  of  Sabino  Creek  wet-dry  readings 
during  January  and  February.  A  comparison  is 
made  of  the  decades  1946-55  and  1956-65  in  the 
following  fashion.  Fix  the  integer  A;  at  1,  2,  .  .  ., 
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or  6.  From  the  590  daily  streamflow  readings  during|| 
January  and  February  from  1946-55  (ignore  leap 
years  in  this  discussion  to  keep  matters  simpler) 
restrict  attention  to  the  wet-dry  aspect  of  the 
readings.  We  thus  have  a  sequence  of  590  ones 
(wet)  and  zeroes  (dry).  In  the  natural  way,  compute 
the  proportion  Pi{k)  of  times  state  i,  a  sequence  of 
k  I's  and  O's,  occurs.  There  are  2''  such  Pi{k). 
Compute  also,  for  each  state  i  o{  k  I's  and  O's,  the 
proportion  Pij  {k)  of  times  state  j  next  occurs.  Note 
that  from  a  state  there  are  precisely  two  states 
which  may  be  reached  by  moving  one  reading  to 
the  right.  i 

The  numbers  Pij{k)  are  computed  with  no  appeal 
to  a  theory.  They  are  simply  proportions.  We  will 
regard  them  and  A:  as  a  possible  value  of  the  param- 
eter which  describes  the  random  behavior  of  wet- 
dry  readings  during  1956-65,  under  the  assump- 
tions described  in  the  preceding  section.  That  is, 
the  theory  is  now  used  to  test  hypotheses  that  the 
sequence  of  wet-dry  readings  during  January  and 
February  of  1956-65  form  a  Markov  chain  of  order 
k.  Some  confidence  intervals  on  k  fall  out  of  this 
study.  . 

As  noted,  the  basic  references  are  the  works  of 
Billingsley  (5,  6)  on  statistical  inference  on  Markov 
chains.  Although  Billingsley  assumes  the  chains 
are  stationary  and  ergodic,  it  is  sufficient  for  the 
asymptotic  results  to  assume  only  that  for  some 
initial  distribution  the  chains  are  stationary  and 
ergodic.  It  is  easy  to  verify  that  all  the  Markov 
chains  considered  in  this  paper  are  ergodic  when 
equipped  with  the  stationary  initial  distribution 
and  the  fact  that  the  stationary  initial  distributions 
do  exist.  ; 

There  are  several  variants  of  the  point  of  view 
taken  regarding  the  hypotheses  testing.  Before 
making  them  precise,  the  approaches  taken  can 
first  roughly  be  classified  as  follows:  (1)  Assuming 
that  the  second  decade  of  Sabino  data  is  a  sample 
from  a  Markov  chain  of  order  k,  k  unknown,  is  it 
possible  that  the  first  decade  data  gives  estimates 
of  the  parameters  of  the  second  decade  which  are 
acceptable  or  reasonable  (regard  this  as  the  actual 
parameters)?  Also,  what  is  the  smallest  k?  (2) 
Assuming  both  decades  are  samples  from  Markov 
chains  of  order  k  (which  need  not  in  fact  be  the 
same),  what  is  the  smallest  integer  k,  if  any,  for| 
which  we  accept  the  hypotheses  that  both  chains 
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are  Markov  of  the  same  order  k  and  with  the  same 
transition  probabilities?  (3)  In  terms  of  confidence 
intervals,  when  do  interval  estimates  of  A  and 

{pij  'fc  +  i^^)}  taken  from  the  second  decade 

include  the  first  decade? 

The  above  is  a  first  description  of  the  program  and 
it  needs  to  be  elucidated.  The  method  of  approach 
basically  relies  on  the  fact  that  a  central  limit 
theorem  for  Markov  chains  is  available.  It  is  this 
fact  which  permits  asymptotic  tests  of  hypotheses 
and  confidence  intervals.  We  work  with  Markov 
chains  of  higher  order. 

This  is  done  by  using  the  results  on  pages  28 
and  29  of  Billingsley  (5)  together  with  theorem  3.1 
of  the  same  reference.  In  the  notation  of  that 
theorem,  let  ,  ..^.^.i  (A:)  equal  the  number  of 
occurrences  whereby  the  sequence  (m,  .  .  .,  ik) 
of  length  k  (each  ij=  1  or  0)  is  succeeded  by  the 

sequence    (it.   .   ■   .,  f^+i).   Let   i^(k)  = 

fi^....,i^.i(k)  +  fi^  ,^.,o(A),  the  total  number 

of  occurrences  of  state  (/i,  .  .  .,  ik)  during  the 
second  decade. 

Let 

.  .  .,  ifc)  = 

^  r/.,....,u.,j(A-)-/,,  .-.(A)/^,,  >,,j{k)^ 

Then  for  each  of  the  2'' different  states, .  .  ., 
ik)  is  asymptotically  distributed  as  chi-square  with 
one  degree  of  freedom  (x'f)  and  is  asymptotically 

independent,  provided  {pi,  +  the  true 

parameter.  (This  is  theorem  3.1  of  Billingsley  (5).) 
Here,  the  term  "asymptotically"  refers  to  a  large 
sample  for  the  second  decade.  (In  fact,  recall  that 
the  actual  sample  size  is  590,  approximately.)  Of 
course  h{ii,  .  .  .,  ik)  is  the  usual  x^i  approxima- 
tion statistic  which,  since  the  degree  of  freedom  is 
one,  has  also  the  form 

h{ii  ik)  = 

r    /■„... .u-i (A) -/i.  ,,(A)(p,,  .,..(A))  y 

L/i,  i,(A)(p.,  *,..(A))(l-p.,  i,.,(A))  J  ' 

the  square  of  a  quantity  which  is  asymptotically 
normal  with  mean  zero  and  variance  one,  provided 
the  {pi,  +         are  the  true  parameters. 


Since  the  hiiu  .  .  .,  ik)  are  asymptotically 
independent  and  asymptotically  normal  under  the 
hypotheses,  we  test  the  underlying  hypotheses  by 
testing  if  the  h's  are  exactly  independent  and 
normal.  This  will  be  done,  but  a  misunderstanding 
may  arise  and  need  be  discussed. 

First  the  tests  will  be  various  X"  tests,  a  test  of 
equality  of  variances  (all  equal  to  one)  frr)m  normal 
populations  with  mean  zero  and  the  Kolmogorov- 
Smirnov  test.  The  point  to  emphasize,  particularly 
with  reference  to  the  Kolmogorov-Smirnov  test,  is 
that  very  little  is  known  about  the  power  of  the  test, 
including  unbiasedness,  when  the  data  is  not 
exactly  independent. 

Finally,  another  reservation  and  its  operational 
consequences  need  to  be  discussed.  In  a  sense,  we 
are  interested  in  Markovian  properties  of  the  states 
of  the  process,  not  of  the  process.  This  ambiguous 
sentence  must  be  clarified. 

In  testing  hypotheses  about  the  order  and  the 

transition  probabilities  of  a  higher  order  Markov 
chain,  aU  states  normally  are  considered.  That  is, 
the  transition  behavior  for  each  pair  of  states  is  a 
factor  to  be  considered.  However,  in  the  data  of 
this  paper  and  presumably  the  data  in  other  Mar- 
kovian-like  processes  with  a  moderately  large 
number  of  states,  the  number  of  occurrences  of  a 
given  state  may  be  very  small.  The  asymptotic 
theory  then  becomes  more  difficult  to  accept. 
Therefore,  in  this  paper  it  was  decided  to  compare 
states  and  their  transitions  only  in  certain  states. 

It  is  very  important  to  emphasize  that  this  modi- 
fication, considering  states  with  a  sufficiently  large 
number  of  occurrences,  can  be  made  so  that  none 
of  the  statements  about  level  of  significance  need 
be  qualified  on  account  of  this  modification.  In 
other  words,  the  experiment  was  designed  taking 
this  modification  into  account.  Where  the  etfects 
of  this  modification  do  occur  is  in  the  acceptances 
of  the  hypotheses  that  the  second  decade  of  data 
is  Markov  of  order  A  when  only  a  subset  of  the  states 
is  tested.  The  computer  simulation  of  the  wet  and 
dry  runs  occurring  during  the  summer  at  the  Sabino 
measuring  station,  to  be  described,  is  the  strongest 
justification  that  the  entire  prt>cess  is  Markov. 

In  order  to  test  stationarity  and  the  Markov 
property  of  order  A.  ideally  one  slu>uld  appeal  to 
the  Neyman-IVarson  lemma  and  its  consequences 
(see   [14).  especially  p.  63):  however,  as  noted 
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above,  the  methods  in  this  paper  are  (partially) 
methods,  although  it  is  possible  that  there  is  some- 
times agreement  in  the  two  approaches  asymp- 
totically. It  will  also  be  seen  that  the  test  of  equality 
of  the  parameters  of  two  distributions  is  the  actu- 
ally desired  test  instead  of  our  approach  of  substi- 
tuting estimates  of  parameters  for  true  unknown 
parameters. 

The  above  remarks,  if  unclear,  should  be  ex- 
plained as  the  study  of  the  Sabino  Creek  during 
January  and  February  is  now  described. 

From  the  first  10  years  of  streamflow  measure- 
ments, the  proportions  Piik)  are  computed,  for 
each  of  the  2^'  states,  k  =  2,  .  .  .,  5.  Also,  the 
proportions  Pij{k)  are  computed  for  each  of  the 
2*^"^'  possible  transitions.  Notice  that  while  the 
chain  possesses  2*^  states,  the  number  of  entries 
in  the  transition  probabilities  matrix  is  only  2^"^' 
instead  of  (2'')^  =  2^''".  Thus,  the  transition  probabil- 
ities matrix  is  not  so  complex  as  at  first  might 
appear.  For  example,  if  A;  ==5  there  are  32  states 
and  potentially  the  matrix  of  transition  probabilities 
would  possess  (32)^=1,024  entries.  In  fact,  only 
64  of  the  1,024  entries  can  possibly  be  nonzero. 

In  addition,  the  number  of  states  actually  used, 
because  of  an  inadequate  number  of  observables 
falling  in  the  state,  was  even  smaller.  This  opera- 
tional aspect  has  been  mentioned  above. 

Here  begins  the  results  of  the  statistical  analysis 
of  the  Sabino  Creek  for  the  period  of  January- 
February.  The  testing  is  to  judge  if  the  first  and 
second  decades  are  Markov  chains  with  the  same 
transition  probabilities  and  of  the  same  order. 
Since  our  feeling  is  that  the  data  support  the 
hypothesis  that  a  Markov  chain  of  order  5  is  ade- 
quate, this  case  is  discussed  first. 

In  this  case,  there  are  64  transition  probabilities 
with  potentially  nonzero  entries.  In  the  manner 
described  above,  the  maximum  likelihood  estimates 
of  all  64  transition  probabilities,  for  the  first  decade, 
were  made  under  the  assumption  that  the  first 
decade  is  itself  a  Markov  chain  of  order  5  with 
stationary  transition  probabilities.  Thus  if  i  and  j 
are  two  states,  Pij{k)  is  defined  to  be  the  proportion 
of  transitions  from  state  i  to  state  j.  If  no  entries 
in  state  i  occurred,  then  Pijik)  is  defined  to  be 
zero  for  all  j.  Notice  that  attention  is  hereafter 
restricted  to  states  j  such  that  the  total  number  of 
occurrences  of  state  i  in  the  first  decade  is  at  least 


11.  The  choice  of  11  requires  discussion.  First, 
this  cutoff  of  11  means  that  the  number  of  states 
studied,  the  "most  frequently  occurring,"  is  12. 
These  12  states  included  81  percent  of  the  entire 
sample  for  the  first  decade  in  defining  the  Pijik) 
(where  here  ^  =  5). 

The  12  /I's  thus  computed  are  asymptotically 
Xi  with  one  degree  of  freedom  and  independent 
under  the  order  five  stationarity  assumptions. 

Statistical  analysis.  — Under  the  assumption  that 
the  12  /I's  are  exactly  Xi  independent,  the 

Kolmogorov-Smirnov  test  statistic  shows  a  level  of 
significance  greater  than  0.52.  That  is  if  the  h's  are 
exactly  Xi  ^md  independent  over  50  percent  of  the 
time,  the  sample  distribution  function  would  de- 
viate by  at  least  as  large  an  amount  from  the  true 
distribution.  The  actual  value  of  the  Kolmogorov- 
Smirnov  test  statistic,  in  the  form  sup  |F(x)  — 
Fnix)  \  where  F„  is  the  sample  distribution  function 
and  F,  the  hypothesized  distribution  function,  is 
0.18. 

Statistical  conclusion.  — The  Kolmogorov-Smirnov 
test  indicates  that  the  12  most  likely  states  indeed 
are  states  of  a  Markov  chain  of  order  5  and  transi- 
tion probabilities  are  stationary. 

Statistical  analysis.  — Since  the  12  h''s  are  asymp- 
totically independent  and  x?'  their  sum  is  x^  with 
12  degrees  of  freedom.  The  actual  value  computed 
is  10.69,  a  level  of  significance  greater  than  0.5. 

Statistical  conclusion.  — If  the  chain  is  a  Markov 
chain  of  order  5,  the  over  50  percent  of  the  time 
the  x^  test  applied  to  the  12  states  would  give  more 
unfavorable  readings.  This  further  supports  the 
contention  that  the  chain  is  of  order  5  and  with 
stationary  transitions.  I|| 

Statistical  analysis.  — A  test  of  the  hypothesis 
that  n  independent  observations  are  normal  with 
mean  zero  and  variance  one  against  the  alternative 
that  the  observations  are  normal  with  mean  zero 
and  variance  cr^  =  1  is  given  in  Lehmann  (14,  p.  129). 
Apply  this  test  to  our  12  /I's  (the  minimum  h  is 
0.0026  and  the  maximum  is  5.28),  and  the  hypothesis 
is  accepted  at  level  of  significance  0.4  and  rejected 
at  level  of  significance  0.5. 

Statistical  conclusion.  — Assuming  independent 
normal  variable  with  mean  zero,  the  test  of  equality 
of  variances  (and  equal  to  1)  asserts  that  40  percent 
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of  the  time  the  h's  would  be  more  significant  under 
the  hypothesis. 

While  it  is  not  known,  at  least  to  us,  if  only  the 
correct  order  of  the  chain  will  give  these  results, 
the  results  certainly  support  the  hypothesis  of  a 
chain  of  order  5. 

Overall  statistical  conclusions  on  Sabino  Creek, 
comparing  1945-55  with  1956—65,  during  January- 
February. —Three  statistical  tests,  applied  to  data 
which  is  obtained  by  transforming  the  actual  sample, 
all  gave  levels  of  significance  of  at  least  0.4.  That  is, 
if  the  wet-dry  readings  from  the  second  decade 
actually  form  (exactly)  a  Markov  chain  of  order  5 
with  stationary  transition  probabilities  obtained 
from  the  first  decade,  then  over  40  percent  of  the 
time  the  data  on  the  12  most  frequent  states  would 
be  worse  (more  significant),  for  each  of  the  three 
tests. 

Further  statistical  analysis  and  conclusions  on 
January— February  data  from  Sabino  Creek.  —  Thin 
paragraph  contains  results  of  tests,  and  our  inter- 
pretation, about  the  possibility  of  the  January- 
February  data  being  Markov  of  order  4  or  less  and 
possessing  stationary  transition  probabilities.  It  may 
be  helpful,  in  this  connection,  to  recall  a  fact:  If  a 
Markov  chain  is  of  order  k,  then  it  is  also  a  Markov 
chain  of  order  k+m.  In  a  sense  to  be  made  clear, 
we  immediately  make  precise,  the  data  gave  smaller 
levels  of  significance  as  the  order  was  decreased. 
For  ^  =  4,  a  cutoff  of  15  was  used  and  nine  states 
from  the  first  decade  satisfied  this  requirement.  The 
value  of  the  Kolmogorov-Smirnov  statistic  is  0.237 
at  a  level  of  significance  of  0.6,  approximately.  The 
sum  of  X"  tests  was  not  used.  The  test  for  equality 
of  variances  accepted  at  level  0.05  and  rejected  at 
0.1.  (This  is  because  the  largest  h  was  12.5.)  Thus,  a 
discrepancy  appears;  the  Kolmogorov-Smirnov  test 
indicates  the  order  is  4,  while  the  test  for  equality 
of  variances  disagrees.  For  k=3,  only  the  test  for 
equality  of  variances  was  performed  and  rejected' 
at  level  0.2.  For  A"  =  2  the  equality  of  variances  test 
rejected  the  order  at  level  0.05  and  similarly  for 
A  =  1. 

Final  Comments  on  Statistical  Analysis  of  Sabino 
Creek  During  January-February.  — k\\  evidence 
used  thus  far  give  fairly  strong  support  to  the 
hypothesis  that  the  January-February  Sabino  data 
over  the  second  decade  is  a  Markov  chain  of  order 
5  with  transition  probabilities  the  same  as  those 


computed  for  the  first  decade.  Thus,  a  test  of 
stationarity  (more  precisely,  a  test  of  a  conse- 
quence of  stationarity)  gives  affirmative  results 
for  the  first  two  decades. 

Next  is  considered  the  streamflow  measurement 
readings  of  the  Sabino  Creek  during  the  summer 
season  of  heavy  precipitation  (known  locally  as 
the  monsoon  season).  Although  the  final  statistical 
analysis  followed  a  similar  approach  to  the  above, 
the  precipitation  pattern  led  to  somewhat  different 
conclusions.  An  earUer  section  of  this  paper  gives 
more  information  about  the  summer  rainy  season. 

The  most  noteworthy  discovery  to  appear  from 
a  comparison  of  the  winter  and  summer  Markovian 
behavior  was  the  fairly  strong  statistical  evidence 
that  the  order  of  the  Markov  chain  describing  the 
summer  behavior  is  smaller.  In  fact  all  evidence, 
to  be  described  below,  suggests  that  a  Markov 
chain  of  order  4  with  stationary  transitions  ade- 
quately describes  the  summer  period.  There  are 
hydrological-meteorological  reasons  to  expect  that 
the  "relevant  history"  of  the  summer  rainy  season 
is  smaller.  As  noted  by  Roefs  (personal  communi- 
cation), the  summer  rainstorm  pattern  consists  of 
brief  but  recurring  heavy  rain,  and  the  effects  of 
such  storms  on  discharge  through  the  Sabino 
Creek  should  be  of  shorter  duration  than  the  effect 
of  winter  precipitation.  The  streamflow  during 
winter  reflects  the  runoff,  to  a  considerable  extent, 
of  snowpack  from  the  adjacent  Santa  Catalina 
Mountains.  Such  runoff  is  more  steady  and  slower 
and  should  reflect  a  longer  period  of  storm  behavior. 

Often,  the  summer  rainy  season  begins  with  mod- 
erate, not  small,  thundershower  activity.  In  an 
effort  to  completely  understand  tiie  summer  pattern, 
the  definition  of  the  summer  season  includes  a 
random  starting  time.  For,  according  to  the  meteoro- 
logical understanding  of  this  area,  the  shift  in 
prevaiHng  winds  which  portend  the  monsoon  occurs 
around  June  25.  The  winds  are  identified  by  higher 
humidity  and  their  direction,  from  the  Gulf  of 
Mexico.  \ct.  absence  of  precipitation  is  quite 
common  up  until  July.  Therefore,  in  order  to  study 
streamflow  behavior  during  the  summer  rainy 
season  it  was  decided  to  define  the  beginning  of  the 
season  as  the  first  day  for  whicli  the  measuring 
station  records  streamflt>w.  The  termination  of  tiie 
summer  season  was  defined  to  be  September  I. 

The  method  of  approach  to  studying  stationarity 
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of  transition  probabilities  and  the  order  of  the  chain 
is  the  same  as  for  the  winter  season.  The  /i-func- 
tions  are  thus  similarly  defined.  One  difference  in 
the  analysis  was  that  a  smaller  "cutoff'  criterion 
was  used..  Even  so,  the  number  of  states  analysed 
is  actually  smaller.  Recall,  that  the  cutoff  is  that 
integer  k  such  that  only  those  states  possessing  at 
least  k  occurrences  in  the  first  decade  are  consid- 
ered. (Recall  also  that  the  number  of  occurrences  in 
the  second  decade  does  not  play  a  role  in  the  cutoff.) 
For  the  summer  Sabino  data,  again  the  hypothesis  of 
a  Markov  chain  of  order  5  and  with  transition  proba- 
bilities given  by  the  first  decade  is  tested.  Only 
those  states  are  considered  which  possessed  at  least 
10  occurrences  during  the  first  decade.  (Only  seven 
such  states  satisfied  the  criterion.) 

Statistical  analysis.  — Under  the  assumption  that 
the  seven  A's  are  exactly  Xi  and  independent,  the 
Kolmogorov-Smirnov  test  statistic  shows  a  level  of 
significance  of  0.6.  That  is,  if  the  A's  are  exactly 
Xi  and  independent,  over  60  percent  of  the  time  the 
sample  distribution  would  deviate  by  at  least  as 
large  an  amount  from  the  true  distribution. 

Statistical  conclusion.  —  The  Kolmogorov-Smirnov 
test  indicates  that  the  seven  most  likely  states  are 
states  of  a  Markov  chain  of  order  5  and  transition 
probabilities  are  stationary. 

Statistical  analysis.  —  A  test  of  the  hypothesis, 
that  the  seven  h's  are  the  squares  of  independent 
normal  observables  with  means  zero  and  variances 
one  against  the  alternatives  of  variances  not  all 
equal  to  one,  accepts  the  hypotheses  at  level  0.6 
and  rejects  at  level  0.7.  (The  minimum  h  is  0.222 
and  the  maximum  is  3.51.) 

Statistical  conclusion.— Suppose  the  observables 
are  from  a  chain  of  order  5.  Then  the  h's  are 
asymptotically  Xi^  since  they  are  the  square  of 
asymptotically  normal  variables.  Also,  they  are 
asymptotically  independent.  We  test  the  hypotheses 
that  the  h's  are  squares  of  exactly  normal  inde- 
pendent variables  with  mean  zero  and  variance  one 
against  the  alternative  that  the  variances  are  not  all 
equal  to  one.  If  the  hypothesis  is  true,  then  over 
60  percent  of  the  time  the  ^'s  would  be  more 
significant.  The  test  for  variances  all  equal  to  one 
indicates  the  states  tested  are  states  of  a  chain  of 
order  5. 

Withholding  comment  until  a  test  of  the  hypoth- 
esis that  the  chain  is  of  order  4  (or  less)  we  continue 


with  the  results  for  chains  of  order  4.  In  this  case, 
with  10  as  the  cutoff  there  were  eight  entries. 

Statistical  analysis.  — Under  the  assumption  that 
the  eight  h's  are  exactly  Xi  and  independent,  the 
Kolmogorov-Smirnov  test  statistic  shows  a  level  of 
significance  of  at  least  0.675. 

Statistical  conclusion.  — The  Kolmogorov-Smirnov 
test  indicates  that  the  eight  states  are  from  a  Markov 
chain  of  order  4,  since  over  two-thirds  of  the  time 
the  Kolmogorov-Smirnov  statistic  would  be  at  least 
this  significant  if  the  chain  is  of  order  4  and  the 
sample  sizes  arbitrarily  large. 

Statistical  analysis.  — The  test  of  the  hypothesis 
that  the  eight  h's  are  the  squares  of  independent 
observables  with  means  zero  and  variance  one 
accepts  the  hypothesis  at  level  0.4  and  rejects 
at  0.5. 

Statistical  conclusion.  — The  test  for  equality  of 
variances,  a  consequence  of  the  assumption  of  a 
Markov  chain  of  order  4  with  stationary  transition 
probabilities,  indicates  that  the  variances  are  the 
same.  The  order  4  and  stationarity  of  transition 
probabilities  appears  to  be  consistent  with  the 
eight  states  tested. 

Overall  statistical  conclusions  on  Sabino  Creek, 
comparing  1945-55  with  1956-65  during  the 
summer  rainy  season.  — Both  the  Kolmogorov- 
Smirnov  tests  and  the  tests  for  equality  of  vari- 
ances, applied  to  those  states  with  at  least  10  occur- 
rences during  the  first  decade,  indicates  that  the 
states  are  from  a  chain  of  order  4. 

Further  statistical  analysis  and  conclusions  on 
summer  data  from  Sabino  Creek.  — Only  the  test 
for  equality  of  variances  was  considered,  in  testing 
the  hypothesis  that  the  order  is  three  or  less.  For 
a  cutoff  of  10,  14  states  had  at  least  this  number  of 
occurrences  (out  of  a  total  possible  of  16).  With 
the  cutoff  of  10  applied  to  the  test  that  the  order 
is  two  or  less,  all  eight  states  qualified.  In  the  case 
of  order  3.  the  test  of  equality  of  variances  rejected 
at  level  0.1.  For  the  order  2,  the  test  rejected  at 
level  0.05. 

Final  comments  on  statistical  analysis  of  Sabino 
Creek  during  the  summer  rainy  season.  — All  evi- 
dence used  gave  support  to  the  hypothesis  that 
the  summer  rainy  season  of  Sabino  Creek  is  a 
Markov  chain  of  order  4. 

A  comparison  of  the  actual  Sabino  Creek  summer 
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sensnn  tvith  a  computer  simuldtion.  — The  interpre- 
tation of  the  above  Sabino  data  is  that  (1)  the 
second  decade  of  the  wet-dry  data  is  a  Markov 
chain  of  order  4,  and  (2)  the  transition  probabilities 
for  the  chain  are  the  same  as  the  maximum-likeU- 
hood  estimates  of  the  transition  probabilities  for 
the  first  decade,  assuming  it  is  also  a  chain  of  order  4. 

While  these  facts  are  insufficient  to  actually  imply 
that  the  entire  two  decades  of  wet-dry  readings  is  a 
Markov  chain  of  order  4  with  stationary  transitions, 
we  simulate  the  summer  data  under  this  assump- 
tion. More  precisely,  we  assume  the  entire  two 
decades  is  a  Markov  chain  of  order  5.  (The  choice 
of  5  rather  than  4  came  about  through  a  misunder- 
standing; but  recall  each  chain  of  order  4  is  of  order 
5.)  The  transition  probabilities  for  the  two  decades 
are  estimated  by  the  usual  maximum  likelihood 
estimates  used  throughout.  (Note:  this  means  we 
assume  the  order  is  5  before  we  obtain  the  maximum 
likelihood  estimates.) 

The  computer  simulation  of  such  a  chain  reveals 
some  rather  strong  evidence,  we  feel,  that  the 
actual  data  is  a  chain  of  order  5.  Of  course,  the 
reader  must  decide  this. 

To  compare  the  actual  data  with  the  simulated 
data,  we  choose,  perhaps  arbitrarily,  to  look  at 
the  distributions  of  runs.  This  is  done  as  follows: 
The  10th  through  100th  percentiles  of  the  sample 
distribution  functions  of  length  of  runs  are  com- 
pared, using  the  actual  and  simulated  data.  Both 
wet  and  dry  runs  are  compared  as  follows: 


Dry  runs 


Wet  runs 


Percentile    Actual   Simulated   Percentile    Actual  Simulated 


10.. 
20.. 
30.. 
40.. 
50.. 
60.. 
70.. 
80.. 
90.. 
100 


1 
1 
1 
1 
1 
2 
2 
^ 
3 
27 


2 
2 
2 
3 
24 


10... 
20... 
30... 
40... 
50... 
60... 
70... 
80... 
90... 
100. 


1 
1 
1 
1 

2 
4 

5 
7 

13 

'43  (32) 


1 
1 
1 
1 
3 
4 
4 
6 
9 
29 


Summary  and  Conclusions 

The  results  of  the  analysis  of  Sabino  Creek  data 
in  terms  of  its  wet-dry  properties  include: 

1.  Statistical  evidence  that  the  wet-dry  se- 
quence in  the  January-February  period  may 
be  described  as  a  Markov  chain  of  order  5 
(days)  with  stationary  transition  probabilities. 

2.  Statistical  evidence  that  a  Markov  chain  of 
order  4  with  stationary  transitions  ade- 
quately describes  the  summer  period.  Thus, 
the  order  is  less  than  for  the  winter  as  might 
be  conjectured  from  the  meteorology  of  the 
region. 

3.  Comparison  of  distribution  functions  for 
actual  and  simulated  dry  runs  and  wet  runs 
for  the  Sabino  summer  flows  gives  strong 
evidence  that  the  actual  data  is  a  chain  of 
order  5.  The  greatest  discrepancy  occurs 
at  the  100  percentile  for  the  wet  runs;  this 
indicates  some  problem  with  extremes. 

The  above  results  set  the  stage  for  stochastic  models 
of  ephemeral  flow  on  Sabino  Creek  and  similar 
streams  that  explicitly  allow  for  the  intermittency 
of  flow.  We  feel  that  this  approach  is  more  natural 
than  current  stochastic  models  in  vogue  wherein 
zero  flows  are  arbitrarily  augmented  by  some  posi- 
tive number  and  then  the  entire  sequence  of  flows 
logarithmically  transformed. 
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DISPERSION  OF  CONTAMINATED  BEDLOAD  PARTICLES 


By  H.  W.  Shen  and  H.  F.  Cheong ' 


Abstract 

The  analysis  is  based  on  the  concentration  dis- 
tribution function  describing  the  dispersion  of  con- 
taminated particles  released  simultaneously  from 
a  source  in  the  bed  of  a  stream  with  steady,  uniform 
flow.  This  is  a  Lagrangian  probabilistic  model, 
given  by  Yang  and  Sayre,  in  which  the  movement 
of  each  particle  is  described  as  an  alternate  se- 
quence of  steps  (the  lengths  of  which  are  gamma 
and  identically  distributed  with  Ki  and  r  as  the  scale 
and  shape  parameters,  respectively)  and  rest  periods 
(which,  apart  from  being  independent  of  the  step 
lengths,  are  themselves  exponentially  and  identi- 
cally distributed  with  parameter  K^).  It  is  shown 
that  the  distribution  is  initially  highly  skewed  and 
becomes  progressively  symmetrical  with  time  so 
that  it  can  be  approximated  by  a  Gaussian  curve. 
The  dimensionless  peak  concentrations  for  different 
values  of  r  are  always  higher  than  their  respective 
asymptotic  solutions,  all  of  which  vary  inversely 
with  the  square  root  of  the  dimensionless  disper- 
sion time.  The  longitudinal  location  of  the  peak  of 
the  distribution  function  advances  slightly  faster 
than  the  movement  of  the  mass  center  of  the  distri- 
bution curve,  so  that  after  a  sufficiently  long  time 
the  peak  approaches  the  mass  center.  Also  pre- 
sented is  a  simpHfied  procedure  for  deriving  an 
excellent  approximation  to  the  envelope  of  the 
family  of  distribution  curves  parameterized  by 
time  even  when  the  shape  factor  r  takes  on  non- 
integral  values.  This  yields  a  curve  that  is  always 
uniformly  lower  than  the  envelope  with  the  devia- 
tion between  the  two  curves  diminishing  with  time. 
For  a  dimensionless  time  scale  greater  than  Hve, 
the  error  incurred  is  5  percent  and  less,  depending 
upon  the  value  of  r. 


'  Professor  of  civil  entiincrriiiti  ami  firaduate  assistant,  re- 
spectively, DepartiiKMit  of  Civil  Ennitieeriiif;.  ('olorado  Slate 
University,  Fort  Collins,  Colo. 


Introduction 

Stochastic  models  describing  the  unidirectional 
movement  of  a  sediment  particle  which  advances 
in  a  series  of  alternate  rest  and  transport  periods 
have  been  proposed  by  a  number  of  investigators, 
notably  Einstein  (7).  Hubbell  and  Sayre  (2),  Yang 
and  Sayre  (6),  and  Shen  and  Todorovic  (5).  Con- 
taminates, such  as  herbicides,  pesticides,  and  radio- 
isotopes, can  attach  to  sediment  particles  and  move 
as  bedload  in  a  stream,  and  the  transport  and  dis- 
persion of  contaminates,  such  as  radioactive  wastes, 
can  be  affected  by  the  dispersion  characteristics 
of  sediment  particles.  The  purposes  of  this  paper 
(based  on  current  knowledge  on  sediment  trans- 
port) are  to  investigate  the  downstream  effects  of 
an  instantaneous  injection  of  contaminated  sedi- 
ment particles  at  a  given  locahty  in  a  straight 
alluvial  stream  with  steady  uniform  flow.  The  analy- 
sis is  based  on  the  stochastic  model  of  sediment 
dispersion  of  Yang  and  Sayre  (6)  wherein  the  shape 
parameter  of  the  gamma-distributed  step  lengths 
is  allowed  to  vary  from  unity  (equivalent  to  an  ex- 
ponential distribution)  to  two  and  three. 

Analysis 

Assume  a  straight  alluvial  channel  where  the  flow 
is  steady  and  uniform.  Consider  a  large  number  of 
contaminated  particles  having  identical  transport 
characteristics  as  the  sand  grains  forming  the  bed 
to  be  released  simultaneously  at  a  time  t  =  0  and  at 
a  station  .v  =  0.  The  concentration  distribution  func 
tion  (6)  describing  the  longitudinal  dispersion  of  the 
contaminated  particles  is 


^        1  («r)n! 


where  K\.  K:.  and  r  are  defined  bv  the  following 
probabiUty  density  functions: 
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/r(0  = 


10, 


otherwise 

t  ^  0 
otherwise 


(2) 


(3) 


where  X  and  T  are  random  variables  describing  the 
step  lengths  and  rest  periods  of  a  single  particle 
which  moves  downstream  in  an  alternate  sequence 
of  steps  and  rest  periods. /((x)  indicates  the  amount 
of  contaminates  per  unit  length  downstream  at  x 
and  time  t  when  a  unit  amount  (either  gravimetric 
or  volumetric)  of  contaminates  is  injected  instantan- 
eously at  X  =  0  and  t  =  0.  Experimental  evidence 
indicates  that  ris  between  one  and  three. 

Decay  of  concentration  a^  x  =  0.  —  It  is  interesting 
to  note  that  the  area  under  the  curve  ftix)  is  (1  — 
exp  {—K2T)),  so  that  the  decay  of  the  concentration 
level  at  the  source  is  exponential  with  time. 

Movement  of  mass  center.  — Under  the  conditions 
of  steady,  uniform  flow,  the  time  rate  of  movement 
of  the  mass  center,  x,  is  essentially  constant.  For 
the  distribution  function  of  equation  1, 


x=  xft{x 
Jo 


)dx- 


and 


dx_  K  -z 
dt~''Ki 


(4) 


Neither  the  relationship  between  the  sediment  trans- 
port rate  and  the  rate  of  movement  of  x  nor  the  varia- 
tion of  K\,  K2,  and  r  for  different  flow  conditions 
are  well  understood  at  present.  Einstein  (1)  attempted 
to  relate  the  rate  of  movement  of  the  mass  center 
with  the  sediment  transport  rate  for  uniform  sedi- 
ment sizes  as  follows: 


dx 
dt 

and  for  nonuniform  sizes: 

dx 
dt 


72/3 


where  qs  is  the  sediment  transport  rate  in  liters  per 
meter  per  unit  time.  Sayre  and  Hubbell  (4)  suggested 
the  following  relationship  between  the  mass  center 
and  sediment  discharge: 


Qs  =  ys(l  -  a)d- 


where  qs  is  the  sediment  transport  rate  per  unit 
width,  js  is  the  specific  weight  of  the  sediment, 
a  is  the  porosity  of  the  bed,  and  d  is  the  depth  of  the 
zone  of  particle  movement.  However,  the  evaluation 
of  d  requires  a  knowledge  of  bedforms  which  is  not 
known. 

Migration  of  the  concentration  distributions.  —One 
can  discuss  the  movement  and  attenuation  of  the 
concentration  peak  by  considering  first  the  case  of 
r—1.  Equation  1  may  be  written  as: 


ftix)  =  /Cje-^'^-* 


h{2VK^K2Xt) 


(5) 


where  I\{  )  is  the  modified  Bessel  function  of  the 
first  kind  of  order  one.  When  fi{x)  is  diff"erentiated 
partially  with  respect  to  x  and  the  set  to  zero, 
one  can  let  y  =  2  VKiK-zXt  and  thus 


dfix] 
dx 


d_ 
dx 


K^Kzte-K'^-'^^' 


Ii(2VKiK2Xt) 
VK^t 


=  2KxK2te->^'^-'^^' 


dy 


dy 
dx 


-  2KjK2te-'<'^-'('' 


hiy) 


=  2KxK2te-'^'^-'^» 


VKj^Tthiy)  KMy) 
Vx        y  y 


=  0 


VK^t 


h{yp)-KJ,{y^)  =  Q 


Vk 


'  hiyp) 


iXp 


(6) 
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where  yp  =  2  \  K\K2Xt  and  JC/,  is  the  position  along  x 
where  the  turning  point  of  fi{x)  is  located  at  any 
time,  t.  Equation  6  was  also  given  by  Hubbell  and 
Sayre  (2).  When  yp  is  sufficiently  large  for  the 
asymptotic  approximation  to  hold. 


Iniyp)  - 


(7) 


where  /«(  )  is  the  modified  Bessel  function  of  the 
first  kind  of  order  n,  we  have  from  equation  6  and 
equation  4  (for  r—  ») , 


(8) 


Vk 


or 


After  a  sufficiently  long  time  has  elapsed,  the  posi- 
tion of  the  peak  concentration  is  almost  coincident 
with  the  location  of  the  mass  center.  However,  when 
yp  is  sufficiently  small  for  the  yp  terms  with  powers 
of  two  and  higher  in  the  ratio  I \{yp)II>{yp)  to  be 
negligible,  one  obtains 


Mxp  ^Hiyp) 

I  Ay,,)  """P 


ypj  Tr(yp)  lAyp)]- 
2  I  l2(yp)  IAyp)i. 


(11) 


which  is  shown  in  figure  1  by  the  curve  for  r=l. 
To  show  this,  we  have  from  equation  5: 


■  \Xp 


f^^h(y,.)e-x^-'i''Xt'. 


(12) 


Note  that 


(13) 


/i(yp)_  2 
hiyp)  yp 


2^ 


^  ]2K 


l{K+l)\ 


2K 


£,Kl{K  +  2)\ 


yp 


From  equation  6, 


or 


VKiXp  yp 

Kit  ~  2, 


(9) 


(10) 


which  indicates  that  no  turning  point  of fi{x)  exist 
for  a  dimensionless  time  scale  less  than  two.  With 
intermediate  values  of  yp,  the  attenuation  of  the 
peak  under  the  condition  imposed  by  equation  6 
is  given  by: 


From  equation  6: 


^P  hiyp) 

2^  ^yiLhiyA 


From  equation  13  and  14: 


2  /,(>  „ 


2  U.(yp 


l\(y,>)  ] 
n(yp)  J 


.^Liyi)] 
)  lAyp)\ 


(14) 


(15) 


Substitution  of  equation.>j  14  and  IS  into  equation 
12  give.>^  the  desired  result. 

For   sufficiently   large   /.  .t  =  ^-.x,,   and  >,. 

A.  1 

—  2A^2/.  Equation  11  takes  on  the  a.^ymptotic 
solution 
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gyp 


V27ryp 
1 


exp 


in  agreement  with  a  similar  result  by  Hubbell  and 
Sayre  (2).  At  large  dispersion  times,  the  peak  con- 
centration varies  inversely  as  the  square  root  of 
the  dispersion  time.  At  smaller  and  intermediate 
dispersion  times,  the  decrease  in  the  peak  con- 
centration is  even  more  rapid  as  shown  in  figure  1. 

It  can  be  shown  that  for  sufficiently  large  t, 
equation  1  may  be  approximated  by  a  Gaussian 
curve  with  mean  x  and  variance  cr^  where, 


x=  j  xft{x)dx^^^ 


(17) 


(16)  and 


^^=P  ix-m^dx^'-^^^^^^^  (18) 
Jo 


By  virtue  of  the  fact  that  the  absolutely  convergent 
infinite  series  of  equation  1  consists  of  nonnegative 
terms  and  that /<(:*;)  is  approximately  a  probability 
density  function  as  t  gets  very  large,  the  expansion 
of  the  characteristic  function  of  At*  is  for  :«;*  =  ^i:i: 
and     —  K-zt. 
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0xJ 


-'u  n=l 


.•„r  ^r^'^"-^*     '  * 


r{nr)  n\ 


=  2  (1-^- 


lU 


-  nr  _=|L 


er'* 


2  '"■  fort*^l 

«  =  o 


1  +  inru  2^  


nr{nr+  1)  (rar+  2) 


3! 


+  .  .  .  +  , 


l  +  irt^u-[rHl+rH^+rt^  y 


finite,  then,  for  u  such  that  Su'^E[y-]  ^  1  and  l^i  =s  1 


loge  4>y(u)  -  -  E[yn  ^  +  1  \u\^E  [|y|3] 


^e\u\*E^[y']. 


By  defining 


and 


where  cr[y]  is  the  standard  deviation  of  the  random 
variable  y,  we  have  from  the  definition  of  the 
characteristic  function 


loge  <i>z{u)  =  logp  <t>y 


o-(y) 


(20) 


+  2rf  Ju'^  +  .  .  .  .  (19) 

A  lemma  on  the  expansion  of  a  characteristic 
function  of  a  random  variable,  Parzen  (3)  stated 
that  if  y  is  a  random  variable  with  zero  mean  and 
Fnite  variance,  and  if  the  third  absolute  moment  is 


Noting  that  a  characteristic  function  also  generate 
moments,  we  have 


i"E[X'l\=<t>^"^{Q),  i 


^-  1 


(21) 


where  0'"'(O)  is  the  r7th  derivative  of  (t>.\^{u) 
with  respect  to  u  evaluated  at  u  =  0.  The  skewness 
coefficient  of  the  concentration  distribution  is 


E[{X*-E[X*])^]  _^E[X%]-3E[X%]E[X*]+2EnX*] 
^[(X*-E[X^]y']^i^ 

^  i0'3'(O)  -3t(^'-^'(Q)<^"'(0)  +  2t[(/)'"(Q)]-' 

~[  {rH%  +  rH*  +  rt*-r^t%y>^ 

^  (r'  +  3r  +  2)rt*  ^  r+2 
(r2t*  +  rf*)-'/2  Vr{r+l)f* 


£[r']     ^     g[(;y:.-£[;y.])^']     ^  r+2 
{E[f]y^     {E[(X*-E[X^]r]y'^  Vr(r+1)/* 
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And  the  fifth  cumulant  is  proportional  to  f*"^'^, 
5  =  3,  4,  5  and  higher  cumulate.  For  sufficiently  large 
f*,  the  skewness  coefficient  approaches  zero  and 
the  use  of  the  lemma  with  equations  20,  21,  and  24 
yields,  for  Sr{r+l)t 

yields,  for  ^  


3r(r+l)t*' 


\0g^  4>z{ll)  =  ——  +  \U 


or 


2 

(25) 


(26) 


which  is  the  characteristic  function  for  a  random 
variable  which  is  normally  distributed  with  zero 
mean  and  unit  variance. 


ft  Ax*)  ^ 


1 


V277T(r+l)«  = 


exp 


1  (:«*  — rf*)2- 


2  r(r+l)t* 


The  concentration  function  can,  therefore,  be 
approximately  represented  by  a  Gaussian  curve 
with  mean  and  variance  given  by  equations  17  and 
18  after  a  long  dispersion  time  has  elapsed.  The 
concentration  function  is  also  highly  skewed  at 
early  dispersion  times,  but  it  progressively  be- 
comes symmetrical. 

For  any  r^l,  the  asymptotic  expression  for  the 
attenuation  of  the  peak  is 


curves  of  fiix)  with  x  as  the  abscissa  for  different 
times  are  shown  in  figures  3,  4,  5,  6,  7,  and  8. 
Figures  3,  4,  and  5  are  based  on  the  experimental 
dx      .  dcr^ 


values  of  -y  and  , 
at  at 


of  run  IM  obtained  by  Yang 


and  Sayre  (6)  for  selected  values  of  r=l,  2,  and 
3  with  Ki  and  K2  computed  from  equations  4  and 
18,  where, 

 —-  (30) 


dt 


Similarly,  figures  6,  7,  and  8  are  based  on  the 
results  of  run  2M.  Runs  \M  and  2M  represent  that 
lowest  and  highest  rates  of  dispersion,  respectively. 
Pertinent  hydraulic  data  for  both  runs  are  given  in 
table  1. 

It  is  interesting  to  note  that  the  envelopes  of 
fi(x)  with  r=l,  2,  and  3  for  each  run  appear  to 
collapse  into  a  single  curve.  An  equation  of  the 
form  F ^{f:^,  Xij.,  t^)  =  constant  may  be  imagined  as 
defining  a  curve  in  the  —  plane  for  each  fixed 
value  of  f*.  An  envelope  of  a  family  of  curves  in 
the  f^  —  x^  plane  is  a  curve  C  with  the  property 
that  for  each  point  P  of  C,  there  is  a  curve  of  the 
family  through  P  tangent  to  C.  The  standard  method 
of  finding  envelopes  C  of  F^if^,  Xj^^,  t^)  —  constant, 
which  are  not  themselves  curves  of  the  family, 
is  to  ehminate  t*  from  the  equations: 


and: 


F:^(/^<,  x^,  t*)  =  constant 

dF  i|c  (/h=,  JCm;,  f  *)  _  Q 

dt* 


(3lj 
(32) 


Mx,)  1 
^1  V27rr(r+l)f* 


(28) 


and  the  asymptotic  expression  for  the  position  of 
the  peak  is 

xp  ~  (29) 

Thus,  in  general,  after  a  sufficiently  long  time 
(say  >  10),  the  peak  concentration  decreases 
inversely  as  the  square  root  of  the  dispersion  time 
(fig.  1),  and  the  longitudinal  position  of  the  mode 
approaches  the  mass  center  of  the  distribution 
function  (fig.  2). 

Envelope  of  the  concentration  distributions.  —  The 


Table  \.  —  Hydraulic  data  for  runs  IM  and2M 


Run  Number 

IM 

2M 

Water  surface  slope  X  10^  

0.088 

0.212 

Water  discharge  (c.f.s.)  

L140 

1.690 

Normal  depth  (ft.)  

0.518 

0.521 

Velocity  of  water  (ft./sec.)  

1.100 

1.625 

Bedform  

Ripples 

Dunes 

Total     sediment  concentration 

(p.p.m.)  

60.210 

871.550 

Total  sediment  discharge  (Ib./sec.)... 

0.00429 

0.0918 

0.30-0.35 

0.30-0.35 

Velocity  of  tracer  (ft./hr.)  

0.585 

4.700 

Rate  of  spread  of  tracer  (ft.'^/hr.)  

1.724 

20.200 
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For  the  case  ofr=l,  by  introducing  the  following  dF*{f*,a,u) 

change  of  variables,  u=  V2K2t  and  a=  V2KiX,  we  

have  from  equation  5 


du 


^  ,  '      (-^)/.(m/)-«/,(a«)   =0  (33) 


then  so  that 


Recalling  the  followiuf:  recurrence  equation  for 
Substituting  into  equation  32,  modified  Bessel  functions: 
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Run  I  M  Af  ter  Yang  ond  Sayre  (197 1 
r    =  1.0 

K|  =  0.6784  in  I  /  foot 
K   =  0.3972     in    1/  hour 


Figures.  — Concentration  distribution  curves. 


hiau)  —lo(au)  Ii  (au). 

au 


Equation  34  becomes,  after  some  rearrangement, 


auh){au] 
/i  {au) 


(35) 


For  au  >  10,  /o(au)  ~  U{au,) ,  and  the  point  of  tan- 
gency  between  the  envelope  and  the  cm\e,  f^{x) , 
is  given  approximately  hy  x*  —  t*.  With  this  approx- 
imate solution  to  equation  34,  the  approximate 
envelope  is 


ft{Xe)=K,e-^''^^eI,{2KxXe). 


(36) 


Note  from  equation  34  that  >x*  and  the  curve 
given  by  equation  36  is  uniformly  lower  over  x  than 
the  envelope.  The  deviation  between  the  two  curves 
decreases  with  x. 

As  shown  in  figure  9,  the  actual  total  travelled 


distance  x{t)  of  a  sediment  particle  is  bounded  by 
the  two  functions  XuCO  anAxiXt).  xiXt)  represents 
the  lower  bound  for  x{t)  by  assuming  that  the  par- 
ticle is  transported  with  instantaneously  velocity 
at  the  end  of  each  rest  period  while  Xu{t)  represents 
the  upper  bound  for  x{t)  by  assuming  that  the  yth 
jump  takes  place  at  the  end  of  the  (y— l)th  rest 
period.  Consequently,  the  difference  between  Xu{t) 
and  xiXt)  is  not  more  than  a  step  length  and  the 

inequality  —  >  1  holds.  The  ratio  —  approaches 

unity  uniformly  with  time. 

One  may  determine  approximately  the  envelopes 
of  ft{x)  for  other  values  of  r  greater  than  unity  by 
using 

rof*  =  x*,       ro^l  (37)j 


as  the  approximate  solution  to  equation  32  in  which 
r=r(i.  One  obtains  from  equation  32,  for  any  1, 


Run  IM  Afler  Yong  ond  Soyre  (1971) 
r   =  2.0 

K,  =  1.0176  in  I/foot 
K=  0.2979    in    I /hour 


Figure  4.  — Concentration  distribution  curves. 
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Let 


Run  IM  Alter  Yong  and  Soyre  (1971) 
r   •  3.0 

K,  =  1.3568    in     1/  loot 
0.2648    in  l/hour 


Table  2.— Residuals  arising  from  the  approximation 


Figure  5.  — Concentration  distribution  curves. 


,11-1 


V{nr)  L(«-l)! 


=  0. 


r(nr)  n\ 


(38) 


(39) 


The  exact  solution  to  equation  38  yields  A  =  0. 
Thus,  A  is  the  residual  for  any  approximate  solu- 
tion to  equation  32.  Table  2  shows  the  residuals  for 
run  IM  when  equation  37  is  used  as  the  approximate 
solution  to  equation  32  for  ro  =  1 ,  2 .  and  3. 

The  procedure  of  using  equation  37  to  derive  an 
approximation  to  the  envelope  of  fi(x)  is  worthy  of 
mention  since  it  provides  a  quick  and  simplified 
solution  to  the  problem,  especially  when  r  takes  on 
a  nonintegral  value  for  which  a  direct  closed 
form  solution  is  not  possible.  Depending  upon  the 
value  of  r,  not  much  sacrifice  on  accuracy  is  in- 


rt., 


r 

A 

',=  1 

'.  =  2 

t,  =  3 

'.  =  4 

1  

0.0933 
.0370 
.019 

0.0283 
.0100 
.0040 

0.0146 
.0070 
.0001 

0.0093 

2  

3  

volved  when  t^  exceeds  5.  However,  this  technique 
yields  a  curve  that  is  uniformly  lower  than  the 
envelope.  The  deviation  between  the  two  curves 
diminishes  with  distance.  Moreover,  the  technique 
provides  a  better  approximation  for  higher  values 
ofr. 

For  each  run,  the  approximate  curves  for  r—  1.  2. 
and  3  appear  to  collapse  into  a  single  curve,  and  no 
significant  differences  are  discernible  except  at 
very  small  values  of  Jt  as  shown  in  figure  10. 


Fit. r HE  b.  —  Concentration  distribution  curves. 
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Figure  7.  — Concentration  distribution  curves. 
The  asymptotic  form  of  equation  36  is 

ft{Xe)  .  e-2-r*e  1 


K,t 


(40) 


Since  a  =  u.  and  x  —  —^,  and  from  equation  8.  we 
nave 


(41) 


and 


fliXe 


1 


(42) 


curves  ft{x)  is  tangent  to  the  curves  at  their  peaks 
after  a  long  time  has  elapsed. 

Contours  of  constant  concentration  as  a  function 
of  time  and  c?isiance.  —  Isoconcentration  contours 


Run  2M  After  Yang  and  Soyre  (  1971) 
r  =3.0 

K|  =0.931    in    I /foot 
1.458   in  I/hour 


00  120 


Figure  8.  — Concentration  distribution  curves. 


X(t) 


x^(t 

/ 
/ 
/ 

u 

^X(t) 

XL(t) 

t 

Figure  9.  —  Graphical  representation  of  the  movement  of  a 
In  Other  words,  the  envelope  of  the  family  of  single  particle. 
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for  different  r  values  for  runs  \  M  and  2M  are  shown 
in  figures  11  and  12.  respectively.  The  intercepts 
of  each  vertical  line  with  a  specified  contour  indicate 
the  two  limits  of  the  critical  zone  were  the  con- 
centration is  above  that  particular  limit.  The  inter- 
cepts of  each  horizontal  line  with  a  contour  defines 
the  limits  of  the  critical  time  period  when  local 
concentrations  exceed  the  specified  level. 

The  movement  of  the  mass  centers  for  different 
values  of  r  are  also  shown  in  figures  11  and  12. 
Since 


tion  C  with  r=  1  is  given  by  equation  5  with  x^  =  K\X 
and  t^  =  Kit, 


c         vr^  ,  

=  e-x*-r*        7,  (2\/7;^J.  (45) 


The  intercept  between  the  mass  center  fine  and  the 
concentration  contour  A*  is,  for /^  =  i,„.  given  by 


C_ 

a:. 


=  e-2V*/,(Z?«). 


(46) 


X  — 


rK.t 


then 


x^  =  K\X  =  rt. 


(43) 


(44) 


An  isoconcentration  contour  for  the  concentra- 


When  the  time  under  consideration  is  suthciently 
large  for  equation  7  to  hold,  the  asymptotic  solution 
is 


C  e^'U 
—  -  e  — , 
Ki  V47rA* 
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(48) 


which  shows  that  the  intercept  between  the  mass 
center  hne  and  the  isoconcentration  contour  varies 
inversely  as  the  square  of  the  concentration  after 
a  long  dispersion  time,  that  is, 


K 


(49) 


Concluding  Remarks 

The  analysis  has  been  based  on  the  stochastic 
model  given  by  Yang  and  Sayre  (1971)  and  the 
shape  factor  in  the  gamma  distribution  was  allowed 
to  take  on  integral  values  from  one  to  three. 

The  concentration  distribution  function  is  initially 
highly  skewed  and  becomes  progressively  symmetri 
cal  with  time  in  that  after  a  sufficient  long  time 
(about  Kit^lQ),  it  can  be  approximated  by 
Gaussian  curve.  The  dimensionless  peak  concen- 


FlGURE  n.  —  Iso-concentration  contours. 
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0  5  10  15  20  25  30  35 


t 

Figure  12.  — Iso-concentration  contours. 

trations  for  r=  1.  2,  and  3  are  uniformly  higher  than 
their  respective  asymptotic  solutions  so  that  after 
a  long  dispersion  time,  the  peak  varies  as  the  in- 
verse square  root  of  the  time.  The  location  of  the 
peak  advances  slightly  faster  than  the  mass  center, 
and  they  are  almost  coincident  after  a  long  time  has 


elapsed.  A  simplified  procedure  of  approximating 
the  envelope  of  the  family  of  distribution  curves 
for  a  specified  run  yields  a  curve  that  is  uniformly 
lower  than  the  envelope.  The  deviation  between 
the  two  curves  is  insignificant  when  xjr  >  .5.  This 
technique  is  worthy  of  mention  since  it  affords  a 
way  of  deriving  an  envelope  even  for  nonintegral 
values  of  r. 
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A  NOTE  ON  MIXED  DISTRIBUTIONS  IN  HYDROLOGY 


By  R.  H.  Hawkins  ^ 


Abstract 

The  use  of  mixed  distributions  in  hydrology  is 
explored,  and  a  decomposition  technique  through 
the  method  of  moments  is  reviewed  and  further 
developed.  The  distinction  between  mixed  distribu- 
tions and  mixed  variables  in  hydrology  is  drawn  and 
illustrated.  Several  hydrologic  records  are  decom- 
posed and  the  resultant  frequency  distributions  are 
compared  against  the  customary  Pearson-III.  The 
use  of  a  mixed  distribution  model  in  the  place  of  the 
currently  used  distributions  is  not  demonstrated. 

Introduction 

Frequency  distributions  have  found  application 
in  hydrology  in  the  estimation  of  extreme  values, 
as  descriptive  devices,  and  in  simulation  work.  A 
variety  of  distributions  have  been  used,  such  as  the 
normal  distribution,  the  two-  and  three-parameter 
log-normal,  two-  and  three-parameter  gamma  dis- 
tributions, log-Pearson-III,  and  the  Gumbel  and 
Log-Gumbel  (2,  5,  9,  10).  A  vigilant  search  would 
undoubtedly  uncover  others. 

The  normal  distribution  is  the  function  of  refer- 
ence in  statistics.  It  is  widely  used  and  understood, 
and  serves  as  a  standard  of  comparison  for  other 
distributions.  Although  simple  indices  of  normality 
(that  is,  the  mean  and  variance)  can  be  easily  cal- 
culated from  sample  data,  hydrologic  variables  tend 
to  be  notoriously  nonnormal.  Attempts  have  been 
made  to  explain  this  nonnormality  in  hydrologic 
variables  and  to  account  for  it  with  a  third  moment 
descriptor,  the  skewness.  Use  of  the  log-normal 
distribution  as  a  means  of  removing  skewness  from 
hydrologic  distribution  is  also  practiced,  although 
such  a  transformation  of  data  does  not  always 
accomplish  normalization. 


That  hydrologic  variables  are  not  normally  dis- 
tributed should  not  be  surprising.  Reich  (9) 
comments  that: 

Nature  has  no  back  room  boy  dictating  that  flood  series  should 
follow  a  particular  law  .  .  .  Rather  let  us  visualize  .  .  .  mathe- 
matical functions  for  what  they  are  — merely  a  continuation  of 
man's  efforts  at  curve  fitting. 

Hydrologic  variables  are  almost  always  the  result 
of  multiple  causes  or  from  several  factors.  Measure- 
ments can  be  envisioned  as  sampling  either  (1)  com- 
bined effects  of  several  phenomena  in  a  single 
sample  or  (2)  sampling  different  phenomena  sep- 
arately over  time  in  a  combined  sample. 

The  approach  herein  will  proceed  on  the  latter 
assumption,  that  is  that  certain  hydrologic  variables 
(as  sampled,  tabulated,  and  applied  in  practice)  are 
the  combined  (or  mixed)  sample  from  two  distinct 
distributions,  but  sampled  bhnd  as  a  single  phenom- 
enon. An  example  of  such  might  be  annual  flood 
peaks,  as  arising  from  two  sources:  (1)  summer 
thunderstorms  or  (2)  errant  hurricanes;  each  with 
its  own  descriptive  distribution.  The  use  of  two 
subpopulations  is  arbitrary,  but  convenient.  It  will 
become  apparent  that  the  technique  and  reason  can 
be  extended  to  three,  four,  or  any  number  of  sub- 
populations,  although  this  becomes  increasingly 
difficult.  Also,  for  simplicity's  sake,  only  normal 
subpopulations  (with  or  without  logarithmic  trans- 
formation) will  be  assumed,  although  this  too  may 
be  altered,  using  the  technique  illustrated  as  a 
starting  point. 

Figure  1  shows  diagrammatically  the  problem 
situation:  Combined  normal  distributions  witnessed 
as  a  single,  distinctly  nonnormal  distribution.  The 
task  is  to  break  down,  or  decompose,  the  single  dis- 
tribution into  its  assumed  two  normal  components. 


'  State  University  College  of  Forestry  at  Syracuse  University, 
Syracuse,  N.Y.  13210.  Present  address:  Department  of  Forest 
Science,  Utah  State  University,  Logan,  Utah  84322. 


Background 

Distributions  in  hydrology  which  can  be  decom- 
posed into  simpler  components  have  been  suggested 
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by  Anderson  (7),  which  is  worth  quoting  verbatim 
here: 

Complicated  frequency  functions  (double  peaked,  etc.)  can 
easily  be  thought  of  as  decomposed  into  a  few  simpler  frequency 
functions,  which  when  "added,"  give  their  final  composite  effect. 
Each  of  the  simpler  functions  is  taken  to  be  more  and  more 
nearly  a  Gaussian  frequency  function  as  the  decomposition 
proceeds  to  more  elementary  levels. 

Anderson  also  warns,  however,  that: 

.  .  .  even  a  basic  effect  may  not  behave  according  to  a  simple 
Gaussian  frequency  function.  Nature  does  not  follow  a  Gaussian 
law  precisely. 

The  problem  was  first  approached  by  Karl 
Pearson  in  1894  (7),  who  attempted  to  explain 
observed  nonnormal  distributions  through  the 
presence  of  mixtures.  For  a  solution,  he  derived  six 
normal  distribution  moment  equations  and  reduced 
the  system  to  a  single  nonic  (ninth  degree)  expres- 
sion. Further  work  was  carried  out  by  Charlier  and 
Wicksell  (3)  and  Cohen  (4),  who  gives  a  basic 
statistical  background  and  a  summary  of  the  method 


X 


Figure  L  — The  mixed  distribution  problem.  Only  the  top  dis- 
tribution (sohd  line)  is  "seen,"  and  only  through  sample  mo- 
ments. Assuming  the  subpopulations  to  be  normal,  determine 
their  parameters  (/x,  cr)  and  weights  (ai,  aj). 


of  moments  used  herein,  and  techniques  that 
circumvent  dealing  directly  with  the  nonic  poly- 
nomial. A  detailed,  general,  serious,  and  mathe- 
matical treatment  of  much  of  the  problem  is  pre- 
sented by  Medgyessy  (6). 

Thus,  the  problem  has  an  old  and  honorable 
history.  However,  in  the  past  it  has  had  only  limited 
application  because  of  the  laborious  calculations 
necessary  for  accurate  solution.  In  hydrology,  use 
of  the  mixed  distributions  has  been  quite  limited. 

Singh  (77)  pursued  the  matter  of  mixed  hydrologic 
distributions  for  certain  streams  in  Illinois,  and 
decomposed  several  records  of  monthly  flows  into 
two  log-normal  components. 

Yevdjevich  and  Jeng  (76)  studied  a  special  case 
of  mixed  distributions,  and  approached  the  problem 
from  the  standpoint  of  nonhomogeneity  in  time 
series.  They  treated  the  case  of  constant  and  linear 
jumps  and  evaluated  their  effects  on  the  composite 
moments.  They  were  not,  however,  concerned  with 
decomposition,  and  restricted  their  considerations 
to  a  constant  component  variance. 

Potter  (7)  suggested  that  the  "dog-leg"  appearing 
in  the  flood  frequency  curves  of  certain  streams  was 
the  result  of  sampling  from  two  different  populations 
of  peak  rates  of  runoff",  and  he  demonstrated  the 
occurrence  of  "dog-legs"  was  common  in  widely 
separated  locations.  This  is  perhaps  the  instance  of 
mixed  distributions  which  is  most  familiar  to 
hydrologists. 

Stoddard  and  Watt  (72)  contrived  a  method  for 
combining  the  frequency  curves  for  summer  rain- 
fall floods  and  winter  snowmelt  floods  for  estimation 
of  extreme  floods  in  southern  Ontario. 

Upchurch  (74)  used  the  mixed  distribution  idea 
to  show  differences  in  beach  sediments  on  the  Great 
Lakes.  The  method  assumes  that  the  sediment  is 
composed  of  log-normal  components  representing 
three  different  sources  or  depositional  processes. 
A  graphical  decomposition  of  the  textural-frequence 
plot  is  used.  Because  of  the  differences  in  the  nature 
and  the  amount  of  the  data,  these  methods  cannot 
be  used  with  confidence  with  most  hydrologic  data. 

The  topic  is  dealt  with  in  certain  other  fields  of 
science  extensively,  for  exantple,  biochemistry. 
Generally,  either  a  graphical  or  an  electronic  analog 
decomposition  procedure  is  used,  with  elements  of 
trial  and  error,  or  subjectivity,  or  both.  ^See.  for 
example.  Tung  (13).) 
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The  Mixed  Distribution  Model 

A  two-component  mixed  distribution  can  be 
expressed  as: 

f{X)  =  aMX)^a^f.AX)  (1) 

where  ai  +  Q!2  =  l,  0<(q:i,  a-i)  <l,  and  f\{X) 
and  f2{X)  are  normal  distribution  functions  de- 
scribed by  IX  u  (Tu  /Lt2,  and  a-z,  their  means  and 
standard  deviations,  ai  and  a2  are  the  relative 
weights  of  each  component  distribution.  The  sub- 
populations  are  assumed  to  be  normal  and  have  no 
skewness,  although  the  mixture  may  indeed  be 
skewed. 

The  probability  density  function  for  the  two- 
component  mixed  normal  distribution  is  thus: 

fiX)  =  («i/o-i  V2^)  exp  [-  {X-fx,ri2a-l] 

+  {a2la2V2^)exp[-iX-iX2VI2o-l]  (2) 

Mixed  Distributions  and  Mixed 
Variables 

A  distinction  should  be  made  between  mixed 
distributions  and  the  closely  related  phenomenon 
of  mixed  variables.  Mixed  distributions  in  hydrology 
might  include  annual  flood  peaks,  storm  rainfalls, 
system  output  before  and  after  alteration,  and  other 
similar  data  falling  into  a  simple  "either-or"  source 
category.  That  is,  the  sample  or  measurement 
taken  is  solely  from  discrete  describable  sources 
or  populations,  akin  to  a  binomial  choice  situation. 

Mixed  variables  are  a  different  situation.  This 
situation  occurs  when  the  measurement  or  sample  is 
taken  of  components  already  in  a  combined  state. 
An  example  in  hydrology  might  be  streamflow 
measurements  which  contained  both  surface  run- 
off water  and  ground  water  components.  In  this 
instance,  it  would  be  invalid  to  use  mixed  distribu- 
tion reasoning  and  algebra  to  decompose  data  into 
component  parameters. 

The  fundamental  difference  between  the  two 
situations  results  in  different  moment  equations, 
and  thus  calls  in  different  means  of  decomposition. 
One  is  a  mixture  of  distributions,  the  other  is  the 
distribution  of  mixtures  or  sums. 

Figure  2  illustrates  the  differences  between  mixed 


distributions  and  mixed  variables.  No  decomposition 
of  mixed  variables  is  undertaken  here. 

Mixed  Distribution  Decomposition 

If  it  were  possible  to  decompose  hydrologic  data 
into  component  distributions,  the  results  could  be 
used  to  advantage  in  several  ways,  such  as  extension 
of  frequency  curves  (prediction  of  rare  events), 
simulation  studies,  and  as  an  aid  in  understanding 
the  basic  underlying  phenomena. 

Moments,  or  expected  values,  can  be  used  to 
describe  both  samples  and  populations,  and  can 
be  easily  calculated  from  a  set  of  data.  Thus, 
starting  with  the  general  expression  E{X'')  = 
aiE{X'[)  +  aiEiX^) ,  then  expanding  the  basic 
central  moment  relationships,  and  recognizing  that 
in  normal  distributions  the  skewness  {g)=0,  the 
kurtosis  (04)  =  3,  and  05  =  0,  the  following  can  be 
shown,  albeit  with  a  great  deal  of  algebra.  These 
results  can  also  be  obtained  by  using  a  moment- 
generating  function  technique. 


=  1  =  ai  +  0:2 

(3) 

E{X^) 

(4) 

EiX^) 

=  ai(o-?  +  /a?)  +  a2(o-i  +  /xi) 

(5) 

E{X^) 

=  ttl/lAl  (3cr2  + )  +             (3o-2  +  ^2) 

(6) 

E{X^) 

=  a,[3o-?(2/LA2  +  o-?)+Mf] 
+  a2  [Serf  (2)U,2  + 0-2) +At|] 

(7) 

E{X^) 

=  a,[5Aticrf(3o-f  +  2/.t?)+Mf] 

+  a2[5)Lt20-|(3o-|  +  2;al)+pt|] 

(8) 

These  are  the  six  normal  distribution  moment 
equations,  in  slightly  different  form,  first  presented 
by  Pearson  in  1894  (7).  Equations  3  through  5  are 
distribution-free,  that  is,  their  validity  does  not  hang 
on  a  specific  distribution.  Equations  6  through  8, 
however,  depend  on  the  assumption  of  normality. 

For  a  given  set  of  data,  the  expressions  E{X), 
.  .  .  E{X^)  can  be  estimated  by  the  sample 
moment  X^  —  XX^jN.  Solution  of  the  six  equations 
gives  component  estimates  of  «,  /x.,  a  as  a,  X,  and 
5,  defining  the  subpopulations  and  their  relative 
weights. 
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MIXED  VARIABLES 


Variables:    J<i  '  Mi.  <yi  p 

X?  •    M2,  <T2 


Supply 
Diagram: 


Sample:  X=X,+Xj 


P=Ml+M2 


Moments:    0-2 = 0-2  -1-  2  po-,    +  <t| 
E(X'')=E(X,+X2)'' 

f(X)=/'f,(X-Z)f2(E)dH 
Density  '0 

Function:     (if  x,  &  independent) 


Example:    Monthly  stream  flows 
from  ground  water 
contribution  plus 
surface  water  flows 


MIXED  DISTRIBUTIONS 

^1  •  >  Ml  .o'l 
X2:   <x.2,  P2  ,cr2 


X=X,  orX2 

^=«<,Pl-|-<2M2 

a2=^,«T2+c<jO-2  +  ^^^  (P,-P2)^ 

E(X'')=  -c,E(Xf)+-C2E(X$) 
f(X)=  '^,f,(X)+  «^2f2(X) 


Annual  flood  peaks 
from  thunderstorms 
or  from  hurricanes 


Figure  2.  — Schematic  distinction  between  mixed  variables  and  mixed  distributions. 


Solution  of  the  equations  is  not  simple.  It  is 
enhanced  by  knowledge  of  any  relationships  be- 
tween the  variables.  Such  knowledge  for  example, 
might  be  that  a\  =  (ii  =  0.5,  made  as  a  simplifying 
assumption  (11),  or  that  a\  =  niln  and  a2  =  niln 
where  the  sample  sizes  of  the  different  comf)onenl 
populations  are  known  or  suspected.  Potter  (8) 
found  that  the  dog-legs  of  his  frequency  plots  could 
be  partially  defined  by  the  season  of  the  flood,  thus 
offering  an  estimate  of  and  n-i.  Yevdjevich  and 
Jeng  (16)  work  from  the  assumption  of  a  constant 
jump  8  in  a  hydrologic  series,  so  that  .  .  the  con- 
stant jump  dt)es  not  change  the  variance  a'^  in  each 
part;"  thus,  cri  =  o->.  The  effect  of  this  prior  knowl- 
edge is  to  reduce  the  number  of  moment  eijuations 
(equations  3  through  8  above)  needed  for  solution 
of  the  unknowns. 


However,  if  no  a  priori  information  is  either  avail- 
able or  specified,  solution  of  the  six  equations  above 
must  be  carried  out.  Examination  of  these  equations 
and  the  availability  of  an  electronic  digital  computer 
prompted  a  solution  outlined  in  detail  in  the 
Appendix.  Briefly,  it  uses  a  multiple  trial  and  error 
technique  to  produce  an  array  of  valid  <ii  and  Xi 
values  which  satisfy  the  X*  and  X^  values.  When 
agreement  for  a  given  a\  is  reached  for  the  two  .Vi 
values,  a  solution  is  attained.  Thence,  flj,  -X^.  Si. 
and  Si  are  calculated  in  a  straightforward  manner. 

Results 

Data  fri)in  several  sources  was  collected,  the 
moments  calculated,  and.  where  possible,  the  de- 
compositions carried  out  by  tlie  above  described 
procedure.  The  results  were  varied  and  surprising. 
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There  were  sometimes  no  valid  solutions  to  the 
equations,  that  is,  no  solutions  for  the  a,  X,  and  5 
values  for  the  subpopulations  which  satisfied  the 
conditions  of: 

Oi,  02,  Si,  52  ^  0 
Ci  +  02=  1 

However,  there  were  instances  of  one  and  two 
solutions,  as  shown  in  table  1.  Figures  3  and  4 
show  cumulative  frequency  plots  for  several  sets 
of  data.  The  mixed  distribution  solution  is  shown 
with  the  log  Pearson-III  as  calculated  by  the 


standard  procedure  (15)  for  comparison.  Points 
shown  are  plotted  according  to  the  formula: 

pp=  {m  —  i)ln. 

The  decreasing  slopes  at  the  upper  extremes  for 
the  flood  peak  and  the  rainfall  intensity  data  should 
be  noted.  It  suggests  an  asymptotic  approach  to  an 
upper  limit,  which  is  in  consonance  with  the  concept 
of  a  maximum  possible  precipitation  or  an  ultimate 
flood. 

Discussion  and  Conclusions 

That  the  mixed  distribution  model  fits  hydrologic 
data  is  not  in  itself  unusual.  The  model  can  be 
considered  simply  as  a  five-parameter  distribution. 


Table  1.— Mixed  distribution  decompositions 


Data 

Solutions 

Index 

a 

X 

5 

Annucil    flood   pesks,   Charlotte  Creek   at  Oavenport 

2 

1 

0.774 

3.896 

0.998 

Center,  N.Y..  1938-67. 

2 

.226 

7.344 

3.761 

1 

.939 

4.187 

1.511 

2 

.061 

12.214 

1.876 

Logarithmic 

0 

Annual  flood  peaks,  Genesee  River  at  Scio,  N.Y.,  1917- 

2 

1 

.673 

6.248 

1.160 

67. 

2 

.327 

11.920 

5.571 

1 

.883 

6.835 

2.272 

2 

.117 

17.637 

3.554 

Logarithmic 

0 

Annual  flood  peaks.  Little  Tonawanda  Creek  near 

0 

Linden,  N.Y.,  1913-67. 

Logarithmic 

1 

1 

.243 

-.630 

.246 

2 

.757 

+.233 

.363 

Maximum  24-hour  storm  intersities,  Farmington  Ware- 

0 

house,  Utah,  1939-68. 

Logarithmic 

1 

1 

.672 

-4.491 

.512 

2 

.328 

-3.252 

.372 

Chloride  concentrations,  Meadowbrook  at  Jamesville 

0 

Road,  Syracuse,  N.Y.,  1969-70  (109  samples). 

Logarithmic 

1 

1 

.678 

4.311 

.518 

2 

.322 

6.776 

1.124 

Cohen's  data  (4)  

2 

1 

.276 

46.467 

5.356 

Nature  and  source  unknown 

2 

.724 

57.322 

3.217 

1 

.510 

50.608 

6.704 

2 

.490 

58.191 

1.851 

Notes:  All  flood  peaks  in  units  of  1,000  c.f.s.  All  logarithms  to  natural  base.  Second  solution  to 
Cohen  data  not  given  in  reference.  Farmington  Warehouse  data  from  U.S.  Forest  Service,  Intermountain 
Station,  Logan,  Utah. 
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M  30 
Ik 
U 


ANNUAL  FLOOD  PEAKS 
Genesee  River  nr  Scio,  N.Y. 
O.A.  -  306  sq.  mi. 
1917-67 


LOG  PEARSON -III 
M  -  1 .  974 
S  *■  0.485 
G  •  0.162 


MIXED  NORMALS - 

a,-  0.673 
«i=6249  Xi=11920 
»,=  in6  8,=  5571 


I     I    I    I     I  ' 


FREQUENCY 
OENS'TV 


99  W   95   90    80  70      50      30  20     10    5     2    1  0.5 

EXCEEDANCE  FREQUENCY- PERCENT 


B 


MAXIMUM  24-HOUR  STORM  IKTENSITIES 
Fsrmington  WarehouM.  Utah 
997  ttorms 
1939-1966 


«,i  -3  253 
8,5  0.374 


FREQUENCY 
DENSITY 


99  9  99  98    95  90    80  7T>      50     3  0  30     10    5      3    1  Q5     ai  0.01 

EXCEEDANCE   FREQUENCY  —  PERCENT 


Figure  3.  — Frequency  curves  for  hydrologic  data,  comparing 
mixed  normal  distributions  to  Pearson-III  distributions.  For 
the  Genesee  River  plot,  the  first  solution  from  table  1  is  shown. 


which  indeed  it  is.  In  the  limited  studies  carried  out 
herein,  no  comparisons  of  goodness  fit,  such  as  the 
probability  of     were  made. 

Accurate  solution  for  the  unknowns  is  inhibited 
by  a  "small"  sample  size,  causing  poor  estimates 
of  the  higher  moments.  The  problem  of  estimation 
looms  large  in  this  solution,  encouraging  a  priori 
assuinptions  for  some  of  the  unknowns.  Cohen  [4) 
created  a  composite  sample  of  two  assumed  normal 


samples,  ni  =  334  and  n2  =  672,  for  A' =  1006.  His 
decomposition  results,  presented  below,  tend  to 
discourage  mixed  distribution  decomposition  with- 
out prior  knowledge,  or  large  samples,  or  both. 


Composite  Moment  Percent 

Variable                              estimate '  estimate  error 

a,                                             0.332  0.276  16.9 

Ai,                                           47.72  46.46  2.6 

/X2                                           57.61  57.32  .5 

0-,                                           5.97  5.35  7.6 

<Tz                                           3.03  3.22  -6.3 

'"True"  value  from  individual  samples  of  ni  =  334  and 
n-i  =  672  before  mixing. 


A 


CHLORIDE  CONCENTRATIONS 
Meadow  Brooli  at  Jamesv 
Syracuse    N«w  Yorit 
Nov  1969 -Ocl  1970 
(N.109I 


EXCEEDANCE  FREQUENCY  PERCENT 


COHEN'S  DATA 
(nasi  aokuTiOMt 

INDIVIDUAL  POINTS  NOT  GIVEN 


PEARSON-III-- 
M-54  ]}] 
S-   »  243 
G-   0  9*1 


OJU  008 
FREQUENCY 
DENSITY 


Fu;i'RK  4.  —  FrcniicncN  curves  for  .\.  Water  qualitv  Aa\a:  and  B, 
dai.H  used  bv  Cohen  (4). 
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Large  samples  (A'^-^  1,000)  in  normal  situations 
of  hydrologic  analysis  are  rare,  and  almost  non- 
existent if  an  annual  series  is  of  interest.  How- 
ever, certain  other  measurements  are  available  in 
quantity,  such  as  storm  rainfalls  (see  fig.  3B),  daily 
precipitations,  streamflows,  or  wind  movements, 
tree  ring  data,  grain  size  distributions,  topographic 
information  (for  example,  distribution  of  elevations), 
and  water  quality  data  (see  fig.  4A).  Care  should  be 
taken  to  distinguish  between  data  fitting  the  mixed 
distribution  concept  and  that  more  properly  a 
mixed  variable  phenomenon.  The  possibility  of 
multiple  solutions  should  be  recognized.  Addition- 
ally, the  assumption  of  data  independence  should 
be  carefully  examined  or  accounted  for. 

A  mixed  distribution  analysis  might  be  in  order 
in  the  following  circumstances:  (1)  A  shotgun 
approach  to  data  analysis  of  a  large  sample,  with 
further  examination  of  the  phenomenon  if  a  de- 
composition occurs;  (2)  decomposition  of  suspected 
mixtures;  (3)  decomposition  of  population  data  or 
100-percent  samples.  The  term  "large"  sample 
must  be  evaluated  considering  any  a  priori  assump- 
tions made  about  the  parameters. 

Thus,  the  outright  application  of  mixed  distribu- 
tions by  the  procedure  outlined  to  the  ordinary 
problems  of  hydrology  is  yet  to  be  demonstrated, 
mainly  because  of  the  difficulties  in  estimating  the. 
higher  movements.  It  does  suggest  a  means  of  ex- 
plaining skewness  and  an  alternate  to  logarithmic 
transforms  and  Pearson-III  distributions.  Further 
refinement  and  examination  of  both  the  technique 
and  its  appropriateness  to  hydrologic  problems, 
might  yield  further  insight  to  hydrologic  phenomena 
currently  masked  as  mixed  distributions.  The 
phenomenon  of  mixed  variables  needs  study  and 
definition,  and  the  distinction  from  mixed  dis- 
tributions should  be  emphasized. 

Acknowledgments 

This  work  was  carried  out  as  a  part  of  the  re- 
search effort  of  the  State  University  of  New  York 
Water  Resources  Center,  in  cooperation  with  and 
receiving  support  from  the  SUNY  College  of 
Forestry  at  Syracuse  University.  The  Farmington 
Warehouse  data  was  supphed  by  the  U.S.D.A., 
Forest  Service,  Intermountain  Forest  and  Range 
Experiment  Station,  Logan,  Utah.  Correspondence 
and  discussions  with  Dr.  Krishan  P.  Singh  of  the 


Illinois  State  Water  Survey  offered  insight  and 
encouragement  in  the  earlier  stages  of  this  work. 

Literature  Cited 

(1)  Anderson.  D.  V. 

1966.  REVIEW  of  basic  statistical  concepts  in 
hydrology.  In  Statistical  Methods  in  Hydrology. 
Pp.  3-27.  Proceedings  of  Hydrology  Symposium 
No.  5.  Inland  Waters  Branch,  Department  of 
Energy,  Mines,  and  Resources.  Ottawa,  Canada. 

(2)  Beard.  LeoR. 

1962.  statistical  methods  in  hydrology.  74  pp. 
U.S.  Army  Engineer  District,  Sacramento,  Calif. 

(3)  Charlier,  C.V.L.,  and  Wicksell,  S.  D. 

1924.  ON  THE  DISSECTION  OF  FREQUENCY  FUNCTIONS. 
Arkiv  for  Matematik,  Astronomi  och  Fysik,  Bd.  18, 
No.  6. 

(4)  Cohen,  A.  C. 

1967.  ESTIMATION  IN  MIXTURES  OF  TWO  NORMAL  DIS- 
TRIBUTIONS.   Technometrics  9(1):  15-28. 

(5)  Markovic.  R.  D. 

1965.  probability  functions  of  best  fit  to  distribu- 
tions of  annual  precipitation  and  runoff. 
33  pp.  Hydrol.  Papers  No.  8.  Colorado  State 
University. 

(6)  Medgyessy,  P. 

1961.  DECOMPOSITION  OF  SUPERPOSITIONS  OF  DISTRIBU- 
TION FUNCTIONS.  228  pp.  Hungarian  Academy  of 
Sciences.  Budapest. 

(7)  Pearson,  K. 

1894.  contributions  to  the  mathematical  theory  of 
EVOLUTION.  Roy.  Soc.  London,  Phil.  Trans.  185: 
71-110. 

(8)  Potter,  W.  D. 

1958.     upper  and  lower  frequency  curves  for  PEAK 

rates  of  runoff.  Amer.  Geophys.  Union  Trans. 
39(1):  100-105. 

(9)  Reich,  B.  M. 

1969.  flood  series  for  gaged  Pennsylvania  streams. 
Res.  Pub.  63.  83  pp.  The  Institute  for  Land  and 
Water  Resources.  The  Pennsylvania  State  Univer- 
sity. 

(10)  Sangal,  B.  p.,  and  BiswAS,  A.  K. 

1970.  the  3-parameter  log-normal  distribution  and 
ITS  applications  in  hydrology.  Water  Re- 
sources Res.  6(2):  505-15. 

(11)  Singh,  K.  P. 

1968.  hydrologic    distributions    resulting  from 

MIXED      populations      AND     THEIR  COMPUTER 

simulation.  Pp.  375-85.  Paper  presented  at  the 
Symposium  on  "The  Use  of  Analog  and  Digital 
Computers  in  Hydrology."  Tuscon,  Ariz. 

(12)  Stoddart,  R.  B.  L.,  and  Watt,  W.  E. 

1970.   flood  frequency  prediction  for  intermediate 

drainage    basins    in   southern   ONTARIO.  C.E. 

Res.  Rpt.  66,  63  pp.  Dept.  Civ.  Engin.  Queen's 
University  at  Kingston,  Ontario. 


PROCEEDINGS  OF  THE  SYMPOSIUM  ON  STATISTICAL  HYDROLOGY 


343 


(13)  Tung,  L.  H. 

1966.  method  of  calculating  molecular  weight 
distribution  function  from  gel  permeation 

CHROMATOGRAMS.  Jour.  Appl.  Polymer  Sci.  10: 
375-85. 

(14)  Upchurch.S.  B. 

1970.  mixed  populations  sediment  in  nearshore 
ENVIRONMENTS.  24  pp.  Paper  presented  at  Great 
Lakes  Research  Conference,  Buffalo,  N.Y. 

(15)  Water  Resources  Council. 

1967.  A  UNIFORM  technique  FOR  DETERMINING  FLOOD 

FLOW  FREQUENCIES.  Water  Resources  Council 
Bui.  15,15  pp. 

(16)  YEVDjEViCH,V.,andjENG,  R.  J. 

1969.     PROPERTIES   OF    NON-HOMOGENEOUS  HYDROLOGIC 

SERIES.  33  pp.  Hydrol.  Papers  No.  32.  Colorado 
State  University. 


Appendix 

Solution  of  the  Mixed  Distribution  Equations 

For  a  range  of  assumed  ai  values,  values  of  Xi. 
(and  thus  by  calculation  02,  X2-,  5i,  and  s-z)  are 
chosen  (through  an  incrementing,  iteration,  and 


refinement  procedure)  such  that  the  calculated 
value  of  (from  equation  7)  is  matched.  The 
range  of  a\  can  be  from  0  to  1.00,  and  an  increment 
in  the  general  size  of  0.04  is  satisfactory.  All  solu- 
tions which  provided  negative  values  of  sf  and  $1 
are  rejected.  Trial  values  of  A'l  range  from  X  —  2s 
to  X.  The  output  from  this  step  is  a  table  of 
statistically  and  algebraically  valid  values  of  ui 
and  Xi. 

The  above  procedure  is  then  repeated,  except 
that  X^  (from  equation  8)  is  now  used  as  the  criteria 
for  matching  in  selection  of  Oi  and  X\.  The  two 
values  of  A^i  are  then  arrayed  or  plotted  for  common 
Oi  values,  as  shown  in  figure  5.  The  intersection  5, 
if  any.  provide  the  solution  values  for  Oi  and  Xi, 
since  all  six  equations  are  satisfied.  Then,  values  of 
a-i,  X2,  Si,  and  52  are  successively  calculated 
directly  from  equations  3,  4,  5,  and  6.  respectively. 

A  program  effecting  this  solution  written  in 
Fortran  II-D  for  an  IBM  1620  computer  requires 
about  8  minutes.  A  general  flow  chart  of  the  tech- 
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Figure  5. —  Solution  for  mixed  distribution  equations.  Output  of  previous  steps  consists  of  an  algebraically  and  statistioallv  valid  .<rra> 
of  assumed  (ii  and  .Vi  values  that  satisfy  both  computed  and  observed  values  of  .V*  and  .Y*.  These  are  shown  as  plotteil  VH>int*. 
From  the  value  of  <i,  (0.333)  .Y,  (1.000)  derived  from  the  above,  d,-  is  calculated  (0.(^^7>.  tlicn.Y..  (LMHK)K  ( l  ,lHX>l  and  5i  (iMXW). 
There  are  as  many  solutions  as  intersections. 
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nique  is  shown  in  figure  6.  A  copy  of  the  program 
used  is  available  from  the  author  upon  request. 

Symbols 

a,  a  Weight  or  relative  frequency  factor  for 
component  distributions  in  the  sample  or 
in  the  population 
X,  fjL  Arithmetic  mean  of  sample  or  population 
s,  cr  Standard  deviation  of  sample  or  population 
g  Moment  coefficient  of  skewness 

c  As  a  subscript  on  sample  moments  or  ex- 

pectations. Indicates  calculation,  usually 


on  a  trial  basis,  through  input  of  com- 
ponent parameter  estimates  (a,  X,  s)  to 
equations  3  through  8. 

exp        exponentiation;  for  example,  exp  {X)  =  e^ 

/(    )      Frequency  density 

E{    )     Expectation  operator 

p  Correlation   coefficient,  or  dimensionless 

covariance 

1,2  As  subscripts;  pertaining  to  components 
distribution  parameters.  Component  1  is 
defined  as  that  with  the  smallest  mean. 
Unsubscripted  symbols  refer  to  the  mixed 
(total)  distribution  parameters. 


START 

_  i 

Feed  X,  X'.  .  ■  X^  as  calculated  from  basic  data 
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Calculate  and  Print  s  and  g 
Assume  a,     [Ex:  a, -^0.04] 
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Figure  6.  —  General  flow  chart  of  computer  solution  to  mixed  distribution  equations  for  two  normal  components. 


DEVELOPING  REGIONAL  STOCHASTIC  DATA  BASES 

By  J.  C.  Wade,  A.  0.  Weiss,  and  I..  R.  Beard  » 


Abstract 

Basic  to  all  water  resource  planning,  design,  and 
operation  studies  is  the  development  of  an  accept- 
able hydrologic  and  meteorologic  data  base.  Because 
most  of  the  available  historical  records  either  have 
(1)  gaps  located  within  an  otherwise  continuous 
set  of  observations,  or  (2)  long  periods  of  missing 
observations  prior  to  gage  installation  or  sub- 
sequent to  gage  removal;  the  problem  of  how  to 
reconstitute  the  missing  observations  over  the 
appropriate  temporal  and  spatial  horizon  and  then 
generate  other  equally  likely  stochastic  data  sets 
is  one  of  considerable  importance. 

Several  methodologies  and  associated  computer 
programs  exist  for  filHng  noncontinuous  multi- 
station data  sets.  The  successes  and  failures  real- 
ized in  striving  to  use  one  of  these  techniques,  the 
Monthly  Streamflow  Simulation  Program  (MOSS), 
in  eastern  Texas  are  discussed;  specific  modifica- 
tions which  were  made  to  MOSS,  in  order  to 
develop  acceptable  filled-in  data  bases,  are  also 
presented;  and  usage  procedures  are  suggested  for 
developing  filled-in  and  stochastic  data  bases  in 
eastern  Texas. 
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Introduction 

Extremely  important  to  effective  water  resource 
planning  is  accurate  and  complete  hydrologic  in- 
formation as  a  function  of  time.^  Normally,  this  is 
not  available;  however,  through  the  years,  hydrol- 
ogists  have  devised  numerous  methodologies  for 
extracting  increasing  amounts  of  useful  information 
from  limited  historical  records  of  varying  durations 
within  a  region  being  analyzed.  Toward  this  end. 
methodologies,  varying  from  simple  linear  regres- 
sion models  to  the  more  complex  multivariate 
regression  models,  have  been  developed  and  used 
to  estimate  missing  data  in  historical  records,  to 
help  convert  historical  records  to  predetermined 
times  in  the  past  (for  example,  naturalized  con- 
ditions), and  to  help  project  future  streamflow 
conditions. 

Consistent  with  these  developments,  one  of  the 
purposes  of  this  paper  is  to  briefly  describe  a 
procedure  which  is  used  by  the  Texas  \^  ater  De- 
velopment Board  to  refine  systematically  the 
historical  hydrological  data  of  Texas.  This  pro- 
cedure involves  (1)  quantifying  the  eff^ects  that 
man-induced  changes  to  the  hydrological  regime 
have  on  the  streamflow  conditions.  (2)  adjusting 
historical  streamflow  data  to  natural  conditions. 
(3)  filling  gaps  in  long  duration  records  and  ex- 
tending sht)rt  duration  records  of  naturalized  stream- 
flow  data,  and  (4)  projecting  the  filled-in  naturalized 
information  to  various  selected  future  conditions. 
These  four  major  steps  and  the  adjust metit  param- 
eters used  are  graphically  portrayed  in  figure  1. 

The  second  purpose  of  this  paper  is  to  describe, 
in  detail,  step  3  of  the  overall  refinement  procedure, 
and  thus  discuss  the  problems  encountered  and 
successes  realized  in  using  a  iniilliv  ariate  Monthlv 
Streamflow  Simulation  Progratn  (MOSS)  developed 


-  ^  eiss,  A.  O..  W,  1..  Meier,  and  I..  R.  Be.-ird.  F.FFEiTlvF.  I  SE 

OF     STOCHASTIC     INFORMATION     WITH     OFT  All  EO  AWl^TIC 

MODELS.  International  Symposium  on  Stochastic  Hydraulics. 
Pittsburjih.  Pa..  Mav  1«J71. 
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by  the  U.S.  Army  Corps  of  Engineers  ^  for  filling 
in  and  extending  naturalized  streamflow  records  on 
an  example  problem  — the  Sabine  River  Basin  of 


3  Hydrologic  Engineering  Center.  HEC-4  MONTHLY  STREAM- 
FLOW  SIMULATION.  U.S.  Army  Corps  of  Engineers,  Davis, 
Calif.  1971. 


Texas.  Although  MOSS  performs  its  analysis  using 
monthly  time  increments,  daily  records  are  also  used 
for  determining  lag  time  factors  in  order  to  account 
for  the  effects  of  travel  time  on  monthly  streamflow. 
Although  the  example  problem  is  comprised  of  a 
single  river  basin,  the  procedures  described  will 
also  handle  multibasin  configurations. 


STEP  1 


DATA  COMPILATION  AND  SELECTION 

•  Compile,   Verify,   and  Evaluate 

Available  Historic  Hydrologic  Data 

«  Select  a  Representative  Design 
Per  i  od 


STEP  2 


ADJUSTMENTS  TO  NATURAL  CONDITIONS 


.  Adjust  the  Following  Parameters  for 
Man-Matle  Regulated  Effects 

Municipal  Use 
I  ndustr  i  a  I  Use 
Agr  i  cu I tura I  Use 
Export 

Navigation-Recreation 
Municipal   Return  Flow 
Industrial    Return  Flow 
Agricultural    Return  Flow 
1 mport 

Reservoir  Evaporation 
Reservoir  Change   In  Storage 
Ground  Water  Use 


•  Adjust  the  Following  Parameters  for 
Man-Induced  Unregulated  Effects 

Farm  Ponds 

Soil  Conservation  Service  Structures 
Land  Use 


STEP  3 


STEP  4 


-V- 


DATA  FILL-IN  AND  EXTENSION 

,   Flll-in  and   Extend   "Natural"  Streamflow 
to  Representative  Design  Period 

•  Generate  Stochastic  Data  Sequences 


V 


PROJECTIONS  TO  FUTURE  CONDITIONS 

,  Develop  Projected  Unregulated  Stream- 
flow  Data   for  Future  Watershed 
Development  and  Management 

,  Use  Projected  Regulated  Water  Use, 
Return  Flows,  and  Reservoir  Storage 
Data  to  Modify  Projected  Unregulated 
Streamf I ow  Data. 


Figure  1.— 


Summary  of  the  four-step  hydrology  data  refinement  procedure. 
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The     Hydrology    Data  Refinement 
Procedure 

Step  l.  —  Data  Compilation  and  Selection 

This  step  involves  identifying,  collecting,  verify- 
ing, and  selecting  for  use  those  water-oriented 
historical  records  which  are  meaningful  to  the 
refinement  of  the  hydrological  data  base  for  the 
region  being  studied.  This  step  also  involves  se- 
lecting periods  of  time  from  the  historical  data 
base  over  which  continuous  monthly  information 
is  desired.  The  major  types  of  basic  data  param- 
eters required,  in  addition  to  streamflow  data,  are 
itemized  in  figure  1. 

Step  2.— Adjustments  to  Natural  Conditions 

During  this  step,  historical  streamflow  data  are 
adjusted  to  "natural"  or  unimpaired  conditions. 
This  involves  the  identification  and  quantification 
of  the  impact  that  changes  in  water  use,  return 
flow,  reservoir  storage,  land  use,  conservation 
practices,  and  floodwater  retarding  structures  have 
on  the  monthly  streamflow  conditions  as  historically 
observed. 

The  appropriate  streamflow  adjustments  are  made 
at  selected  key  control  points  (gages)  in  the  region 
of  study  for  a  period  of  time  generally  coinciding 
with  the  length  of  one  of  the  longer  streamflow 
records.  This  period  of  time  is  usually  a  minimum 
of  30  years,  and  generally  not  over  50  years.  A 
general  streamflow  accounting  model  and  numerous 
supporting  data  management  and  analysis  programs 
collectively  are  used  to  assist  the  planner  in  adjust- 
ing the  historical  data  to  "natural"  conditions  for 
the  period  of  record  selected. 

In  this  analysis,  a  total  of  15  parameters  identi- 
fying man-made  and  man-induced  effects  on  stream- 
flow,  as  summarized  in  figure  1,  are  evaluated.  The 
first  12  are  those  which  directly  regulate  streamflow, 
whereas  the  last  three  are  those  which  indirectly 
regulate  streamflow;  each  is  evaluated  separately 
because  of  their  differing  effects.  The  details  on 
the  actual  adjustment  techniques  are  beyond  the 
scope  of  this  paper  but  can  be  found  in  a  report  by 
Banks.-* 


Step  3.  — Data  Fill  In  and  Extension 

Step  3  consists  of  using  a  multivariate  stream- 
flow  simulation  program  to  fill  gaps  in  long  duration 
records  and  to  extend  short  duration  records  to  be 
consistent  with  the  longer  duration  records.  This 
is  done  on  a  monthly  basis  using  the  "natural" 
streamflow  data  developed  in  step  2.  Concurrent 
with  developing  a  completely  filled-in  data  set, 
other  equally  likely  stochastic  data  sequences  of 
streamflow  and  related  hydrological  parameters, 
for  prespecified  durations,  are  generated  and 
selected  for  use  in  planning  studies.  This  is  done 
according  to  the  procedures  described  by  the 
Texas  Water  Development  Boards.'^ 

Step  4.— Projections  to  Future  Conditions 

Step  4  consists  of  adjusting  (projecting)  the  filled- 
in  "natural"  streamflow  data  developed  in  step  3  to 
various  future  man-induced  conditions  of  watershed 
development,  groundwater  use,  and  river  basin 
management  for  use  with  planning,  designing,  and 
operating  studies.  This  "projected  unregulated 
streamflow"  data  is  developed  for  postulated  condi- 
tions of  basin  development  at  specific  future  points 
in  time  (for  example.  1990.  2000.  2020)  and  for  a 
sequential  period  (for  example.  1970-20201. 
Stochastic  sequences  of  streamflow  and  related 
hydrologic  parameters  generated  in  step  3  are  also 
adjusted  to  given  future  conditions.  Additional 
adjustments  are  made  to  the  "projected  unregulated 
streamflow"  data  during  this  step  to  account  for 
estimates  of  projected  regulated  water  use.  return 
fl()w.  and  reservoir  storage  data.  This  is  done  befi>re 
using  these  data  with  the  modeling  procedures  also 
discussed  in  the  Texas  W  ater  Development  Board 
Report  131. 

THE  PROBLEM 

The  planning  region  discussed  herein  is  a  portion 
of  Texas  and.  as  shown  in  figure  2.  llu-  ineteorolop>- 
varies  considerably  across  the  State.  Pre«ipitati«>n 
in  the  southeastern  portions  of  the  State  is  tloini- 
nalcd  In  weather  patterns  originating  in  the  Gulf 
of  M<"\ico:    wlxMcas.   precipitation   in  the  north- 


*  Texas  \\  .iit  i  Developinein  Board.  st*h:h\stic  OPriMi/  vtion 

  \M1  SIMM  \noN  TECHMOIKS  KOK  M\N\l.EMKNT  OF  REl.lON  VI 

^  Banks,  llaivoy  O.  STATl'S  RKPOKT  -  HM)K01  Ol.U    l>\rv  lU  W  XTER  HESOl  Rl  K  SXSVKMS    lev    W  aUM  Dr\ Ipml.  B«<.  Kpl,  IM. 
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Figure  2.  — Average  annual  precipitation  and  streamflow  in  Texas. 


western  portions  of  the  State  is  dominated  by  the 
massive  Pacific  frontal  systems.  Of  particular 
significance  is  the  band  of  potential  thunderstorm 
activity  resulting  where  these  two  systems  meet, 
which,  in  general,  extends  diagonally  across  Texas 
from  the  southwest  to  the  northeast.  Because  of 
these  effects,  the  precipitation  exhibits  a  high 
degree  of  variability. 

Streamflow  in  Texas,  typical  of  most  streamflow 
in  the  southwestern  United  States,  is  also  highly 
variable.  Because  of  this  variability,  the  effective 


use  of  existing  data  fill-in  programs  such  as  MOSS 
is  extremely  difficult.  Proper  application  of  these 
programs  requires  knowledge  of  the  problem  being 
studied,  extensive  knowledge  of  the  fill-in  models 
themselves,  considerable  thought  before  usage, 
and  quantitative  evaluation  of  the  filled-in  data 
sets  before  their  acceptance  for  use  in  planning, 
design,  or  operation  studies. 

The  example  problem  discussed  herein  (the  Sa- 
bine River  Basin)  contains  approximately  9,800 
square  miles  of  drainage  area  in  Texas  and  Louisi- 
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ana.  As  shown  in  figure  3,  the  basin  extends  from 
the  northeastern  part  of  Texas  along  the  Texas- 
Louisiana  boundary  to  the  Gulf  of  Mexico,  a  dis- 
tance of  525  miles.  A  comparison  with  figure  2 
reveals  that  the  precipitation  and  the  resulting 
streamflow  in  the  Sabine  River  Basin  are  among 
the  highest  in  Texas.  Figure  3  also  gives  the  period 
of  historical  record  used  in  this  example  problem 
for  each  of  the  10  streamflow  gaging  stations  for 
which  filled-in  records  are  desired.^  These  stations 
were  selected  from  a  total  of  approximately  20 
available  stations  within  the  basin  and  then  filled 


*  Streamflow  data  after  1964  was  not  included  in  the  example 
problem  because  of  the  addition  of  a  large  reservoir  (Toledo  Bend) 
within  the  river  basin  in  that  year. 


or  extended  to  coincide  with  the  longest  existing 
record  in  the  basin,  that  is,  l-ogansport.  l.a. 

As  is  also  shown  in  figure  3,  the  10  selected 
stations,  according  to  the  selection  criteria,  are 
uniformly  distributed  along  the  mainstem  of  the 
basin  and  along  various  major  tributaries  in  the 
upper  portion  of  the  basin. 

The  selection  criteria  were,  in  this  case,  quite 
simple  because  there  were  only  10  major  stations 
in  the  basin  with  sufficient  record  length  at  which 
meaningful  "natural"  streamflow  data  were  devel- 
oped. Also,  MOSS  has  the  capability  to  simul- 
taneously analyze  a  maximum  of  10  stations.  How- 
ever, in  cases  where  more  than  10  stations  must  be 
filled,  a  multipass  procedure  is  used. 

The  objective  of  the  multipass  procedure  is  the 
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Figure  3.  — Location  and  period  of  record  for  selected  streamflow  ^^a^in^i  stations  in  the  Sabine  Ri\er  Basin. 
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same  as  the  objective  for  the  single  pass  pro- 
cedure—the preservation  of  all  pertinent  frequency, 
serial,  and  cross-correlation  characteristics  of  the 
historical  data  set.  In  order  to  fully  preserve  all  of 
these  characteristics  properly  in  a  multistation  data 
set,  it  is  necessary  to  compute  and  use  a  correlation 
matrix  involving  aU  possible  pairs  of  stations.  With 
MOSS  and  other  fill-in  programs,  this  is  computa- 
tionally intractable,  mathematically  unstable,  and 
too  large  of  a  problem  for  most  existing  computers 
when  the  number  of  gaging  stations  becomes 
large.  Therefore,  in  order  to  transfer  the  proper 
cross-correlation  information  from  the  historical 
data  set  to  the  fiUed-in  and  stochastically  generated 
data  sets,  the  following  general  criteria  are  used  to 
guide  the  multipass  fill-in  procedure. 

First,  the  planner  should  select  and  fiU  in  10 
base  streamflow,  or  precipitation  stations,  or  both 
of  those  which  are  available  and  which  (1)  are  widely 
distributed  throughout  the  region  being  studied, 
and  (2)  have  the  longer  periods  of  record.  The  se- 
lection of  these  base  stations  should  also  be  founded 
upon  obtaining  a  base  data  set  which  tends  to 
represent  the  average  degree  of  cross-correlation 
throughout  the  entire  region  being  studied,  not  one 
which  maximizes  cross-correlation. 

Secondly,  from  a  statistical  viewpoint,  the  region 
should  be  divided  into  clusters  of  subregions  with 
each  cluster  including  one  or  more  of  the  previously 
selected  base  stations.  The  number  of  base  stations 
included  in  each  subregion  is  highly  dependent 
upon  the  total  number  of  stations  actually  being 
filled  as  well  as  the  number  which  are  considered 
necessary  to  maintain  consistent  correlations 
among  the  entire  data  set.  In  general,  as  the  varia- 
bility increases  so  does  the  number  of  stations 
required  to  maintain  this  consistency. 

A  preliminary  aspect  of  the  development  of  "nat- 
ural" streamflow  in  step  2,  which  impacts  greatly 
upon  the  successful  use  of  MOSS,  is  the  proper 
adjustment  of  historical  monthly  streamflow  before 
filHng  gaps.  This  is  done  to  account  for  factors  such 
as  travel  time  between  streamflow  gaging  stations, 
differences  in  runoff  times  of  concentration  at 
different  stations,  and  storm  travel  effects.  Because 
of  these  factors,  monthly  historically  recorded 
streamflow  at  different  locations,  in  the  same  or  in 
different  basins,  does  not  cross-correlate  as  highly 
as  do  flows  appropriately  offset  in  time.  Also, 


because  of  travel  time  between  stations,  water 
resource  system  studies  using  monthly  computa- 
tional intervals  need  monthly  streamflow  data 
appropriately  offset  in  time.  The  amount  by  which 
this  data  should  be  offset  is  not  necessarily  the  same 
for  the  two  purposes.  Furthermore,  if  alternative 
flow  paths  are  possible  between  two  locations  (that 
is,  when  flows  in  a  basin  are  diverted  through  a 
canal  from  one  tributary  to  another),  the  amount  by 
which  flows  should  be  offset  might  be  different  for 
the  two  paths,  thus  resulting  in  a  condition  that  is 
impossible  to  satisfy  without  special  treatment 
during  the  system  study. 

In  view  of  these  conflicting  requirements,  it  is 
considered  that  reconstitution  of  missing  monthly 
data  should  be  based  on  monthly  streamflow 
quantities  that  are  offset  in  time  in  such  a  way  as  to 
maximize  cross-correlations  between  stations  and 
thus  maximize  the  information  transfer  between 
stations.  In  the  event  that  a  different  offset  is  needed 
in  project  studies,  special  routines  must  be  used, 
such  as  combining  fixed  percentages  of  flows  for 
adjacent  months.^ 

The  Fill-in  Technique 

In  the  statistical  analysis  portion  of  MOSS,  the 
flows  for  each  calendar  month  at  each  station  are 
first  incremented  by  1  percent  of  their  calendar- 
month  average  in  order  to  prevent  taking  the 
logarithm  of  zero  flows.  This  increment  is  sub- 
tracted after  the  data  are  fiUed  in.  The  mean, 
standard  deviation,  and  skew  coefficients  of  the 
logarithms  of  incremented  flows  for  each  station 
and  calendar  month  are  computed  as  follows: 

Xi,m^logiQi,,n  +  qi)  (1) 

X  ^^.'"/^  (2) 
Si^       iXi,m-Xiy/{N-l)  (3) 

^i-X  {Xi,,n-Xiy/{N{N-l)iN-2)S^^)  (4) 

m=l 


^  See  footnote  3. 
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where 

A' =  Logarithm  of  incremented  monthly  flow 

^  =  Monthly  recorded  streamflow 

q  =  Small  increment  of  flow  used  to  prevent 

infinite  logarithms  for  months  of  zero  flow 
Z  =  Mean  logarithm  of  incremented  monthly 

flows 

A^  =  Total  years  of  record 

S  =  Unbiased  estimate  of  population  standard 
deviation 

g'=  Unbiased    estimate   of   population  skew 

coefficient 
i  =  Month  number 
m  =  Year  number. 

Each  individual  flow  is  then  converted  to  a 
normahzed  standard  variate,  using  the  following 
approximation  of  the  Pearson  Type  III  distribution: 

ti.n,=  {X,,,„-Xi)ISi  (5) 

Ki ,  ,„  =  6/g,  [ { /i .  ,„/2 )  +  1 } " 3  -  1  ]  +  ^,76  (6 ) 

where 

t=  Pearson  Type  III  standardized  variate 
K=  Normal  standardized  variate. 

After  transforming  the  flows  for  all  months  and 
stations  to  standardized  variates,  the  simple  correla- 
tion coefficients  R  between  all  pairs  of  stations  for 
each  current  and  preceding  calendar  month  are 
computed.  If  there  are  insufficient  simultaneous 
observations  of  any  pair  of  variables  to  compute  a 
required  correlation  coefficient,  that  coefficient  is 
estimated. 

Missing  monthly  streamflow  data  for  the  various 
stations  are  then  computed  for  each  month  in  the 
data  set  according  to  the  following  basin  strategy. 
A  regression  equation  in  terms  of  normal  standard 
variates  is  developed  by  selecting  the  required 
coefficients  from  the  correlation  matrix.  Whenever 
there  exists  a  value  at  all  other  stations  in  the  data 
set  for  the  current  month,  these  values  are  used  to 
compute  the  missing  value. 

However,  if  data  for  the  current  month  is  missing 
for  any  station,  the  preceding  month's  value  at  that 
station  is  used  in  combination  with  the  current 
values  at  all  other  stations.  The  missing  value  is 
estimated  using  the  regression  equation  and  strategy 
discussed  above,  introducing  a  random  component 


whose  variance  equals  the  error  variance  of  the 
regression  equation.  As  a  result  of  using  the  preced- 
ing month's  streamflow  at  some  stations,  the  cor- 
relation matrix  is  not  time  consistent  with  the 
data  matrix;  thus,  all  affected  correlation  co- 
efficients are  recomputed  after  estimating  each 
missing  value.  .After  fill-in,  the  normal  standard 
variates  are  transformed  back  to  streamflow  values 
by  using  the  following  equations: 

ti,>n={[{gil6){Ki.„,-gil6)  +  ir-l}2lgi  (7) 


^,.,„=  AntilogA'i.m  — gi 


imposing  the  constraint: 


(8) 
(9) 


(10) 


Despite  the  fact  that  MOSS  incorporates  statis- 
tical techniques  designed  to  fit  hydrologic  functions 
more  closely  than  many  existing  models,  some  seri- 
ous problems  have  been  encountered  in  apphcation. 
In  the  Texas  studies,  the  most  serious  deficiency 
is  the  occasional  generation  of  extremely  high  or 
low  values  that  are  not  reasonable.  Because  of  these 
high  values,  there  is  also  a  strong  tendency  for 
the  average  of  the  generated  or  fiUed-in  streamflow 
data  to  exceed  the  average  of  the  historical  stream- 
flow.  This  is  due  to  the  fact  that  the  extremely 
high  generated  data  more  than  offset  the  extremely 
low  generated  streamflow.  Streamflow  values  are 
bounded  by  zero  on  the  low  side;  thus,  any  negative 
flows  are  set  equal  to  zero. 

Although  not  experienced  in  Texas  apphcations. 
the  applications  of  MOSS  in  some  other  regions 
indicate  that  the  model  will  not  reconstitute  ex- 
treme historical  droughts  with  any  acceptable 
frequency.  This  is  thought  to  be  caused  by  the  fact 
that  the  persistence  of  low  flows  is  characteristic- 
ally different  from  the  persistence  of  higli  flows. 
The  MOSS  model  uses  a  uniform  error  variance 
(random  component  of  generated  streamflow 
logarithms)  with  respect  to  the  magnitude  of  com- 
puted streamflow.  whereas  experience  in  some 
regions  indicates  that  the  error  variance  is  smaller 
for  low  streamflow  than  tor  high  streamflow. 

Careful  analysis  of  the  generation  of  occasional 
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extreme  and  unlikely  flows  indicates  that  the  cause 
is  inadequate  fitting  of  the  frequency  distributions 
of  flow  logarithms  for  a  given  calendar  month  by 
the  Pearson  Type  III  function.  This  function  is 
bounded  by  a  standard  variate  magnitude  equal  to 
—  2.0/g.  For  example,  if  the  skew  coefficient,  ^,  is 
2.0,  it  is  theoretically  impossible  to  obtain  a  stand- 
ard variate  value  smaller  than  —1.0.  However,  in 
observed  samples  (particularly  small  samples) 
this  condition  might  occur  simply  because  the 
sample  data  do  not  fit  the  logarithmic  Pearson  Type 
III  function.  The  transformed  normalized  variate 
is  then  infinite.  Even  if  "impossible"  values  do 
not  exist  in  the  sample,  some  values  can  exist 
where  transformed  normalized  variates  exceed  a 
value  of  plus  or  minus  4  or  even  5,  which  repre- 
sents a  virtually  impossible  streamflow  quantity. 
This  is  not  too  serious  in  itself  since  such  magni- 
tudes are  retransformed  to  flows  using  the  same 
function,  but  they  are  also  employed  in  regression 
equations  which  relate  to  other  functions  charac- 
terized by  different  skew  coefficients.  When  this 
occurs,  the  extreme  standardized  variates  gener- 
ated are  occasionally  transformed  to  flows  that  are 
completely  unreasonable. 

The  solution  to  this  problem  is  complicated  by 
the  fact  that  streamflows  (and  probably  other 
hydrologic  variables)  do  not  conform  to  a  simple  or 
unique  mathematical  function.  Even  if  they  did, 
sample  data  in  some  cases  yields  statistics  or  co- 
efficients for  that  function  substantially  different 
from  values  of  the  true  population.  Consequently, 
it  is  possible  that  observed  events  can  be  assigned 
extreme  (and  unrealistic)  probabilities  simply  be- 
cause the  type  of  function  or  its  calibration  coeffi- 
cients are  subject  to  uncertainty. 

To  avoid  unrealistic  values  of  normalized  stand- 
ardized variates,  the  following  provisions  have  been 
incorporated  into  MOSS: 

•  1.  Computed  skew  coefficients  are  con- 
strained within  the  limits  of  ±  7. 

•  2.  Transformed  values  of  normaUzed  vari- 
ates of  the  data  set  are  adjusted  to  assure 
that  their  variance  is  unity. 

It  is  considered  that  these  provisions  represent 
a  practical  compromise  between  fitting  data  sets 
with  great  fidelity  and  avoiding  abnormalities  due 
to  model  simplifications  or  sample  irregularities. 


It  is  also  beheved  that  no  mathematical  model  can 
avoid  abnormalities  due  to  sample  irregularities. 

The  problem  of  reconstituting  extreme  historical 
droughts  is  attacked  simply,  by  calculating  the  rate 
of  change  of  error  (random  component)  variance 
with  event  magnitude.  Mechanically  this  is  done  by 
dividing  the  difference  between  the  error  variance 
of  positive  sample  variates  and  the  error  variance 
of  negative  sample  variates  by  the  difference  be- 
tween the  average  positive  variate  and  the  average 
negative  variate.  This  ratio  is  then  apphed  to  the 
computed  standardized  deviates  in  order  to  obtain 
the  variance  of  the  random  component. 

Through  these  somewhat  simple,  but  extremely 
important  refinements,  it  is  thought  that  MOSS  now 
demonstrates  the  capability  to  satisfactorily  fill 
in  streamflow  in  the  more  humid  areas  of  Texas  for 
use  with  the  Hydrology  Refinement  Study  and  sub- 
sequent modehng  activities.  The  results  of  the  re- 
finements are  partly  shown  in  the  following  section 
for  a  portion  of  the  problem  shown  in  figure  3. 


Application  Results 

A  set  of  three  streamflow  gaging  stations  and  15 
test  cases  are  presented  to  demonstrate  the  capa- 
bilities and  limitations  of  a  version  of  MOSS  which 
was  modified  to  better  analyze  and  fill-in  streamflow 
data  in  eastern  Texas.  The  purpose  of  this  presen- 
tation is  to  provide  direct  statistical  comparisons 
of  the  filled-in  sequences  with  the  historical  se- 
quence and  to  use  the  results  of  the  comparisons 
to  help  verify  the  degree  of  effectiveness  of  the 
data  fill-in  technique.  The  stations  chosen  for  these 
test  cases  are  three  of  those  in  the  10-station  set 
shown  in  figure  3.  They  are  the  Big  Sandy  Creek 
gage  (labeled  10  in  fig.  3),  the  Lake  Fork  Creek 
gage  (labeled  9  in  fig.  3),  and  the  Sabine  River  near 
Gladewater  gage  (labeled  2  in  fig.  3). 

For  each  of  the  15  test  cases  a  selected  portion 
(for  example,  6  years)  out  of  the  30  years  of  con- 
tinuous historical  data  is  removed  and  then  refilled 
using  MOSS.  In  the  example  problem,  this  is  done 
for  all  independent  sets  of  6-year  periods  within 
each  of  the  three  stations  selected  with  a  total  of 
30  years  of  continuous  record  existing  at  each  of 
the  three  stations;  a  total  of  15  test  cases  results. 
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An  example  of  the  results  of  one  case  within  the 
15-run  set  of  test  cases  is  shown  in  fig.  4.  This 
figure  graphically  shows  the  filled-in  and  historical 
sequences  for  the  Big  Sandy  Creek  gage  for  the 
period  1939  through  1944  (test  case  6  as  shown  in 
fig.  5).  In  addition,  figure  5  summarizes  meaningful 
6-year  mean  and  standard  error  information  on 
each  of  the  15  test  cases,  and  shows,  in  general, 
how  the  filled-in  data  sets  compare,  on  the  average, 
with  the  original  corresponding  historical  data. 
First  of  all,  in  two-thirds  of  the  15  test  cases  the 
filled-in  data  underestimate  the  6-year  historical 
mean.  The  net  effect  of  this  is  that  the  overall  mean 
of  the  filled-in  data  sets  is  4.6  percent  less  than  the 
mean  of  the  historical  data  set. 

Also,  the  6-year  means  of  the  fiUed-in  data  for 
the  15  test  cases  range  from  a  high  of  32  percent 


greater  than  the  corresponding  historical  data  set 
on  test  case  10  (a  low  flow  period)  to  31  percent  less 
than  the  historical  set  on  test  case  7  fa  high  flow 
period).  It  is  also  interesting  to  note  that: 

•  Only  four  of  the  15  test  cases  have  percent 
deviations  greater  than  20  percent,  all  of 
which  are  in  the  two  stations  with  the  lower 
average  flows  (stations  9  and  10). 

•  The  average  negative  percent  deviation  is 
— 15  percent. 

•  The  average  positive  percent  deviation  is 
+ 14  percent. 

•  Eight  of  the  15  test  cases  have  percent 
deviations  of  less  than  ±  15  percent. 

•  Five  of  the  15  test  cases  have  percent 
deviations  of  less  than  6  percent. 


YEAR 


Figure  4. -Historic  and  filled  -in  iiionlhly  streaniflow  Bin  Sandy  Creek.  Sabine  River  Ra»in.  Texas. 
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30  Year 
Average 
( Percent 
Deviations) 

STATION         1939  1945  1951  1957  1963  1968 


Test  Case 

1 

Test  Case 

2 

Test  Case 

3 

Test  Case 

4 

Test  Case 

C 

D 

Lake  Fork 

2197 

4196 

1304 

337  1 

1586 

2165 

Creek 

2671 

4  474 

1088 

2  56  S 

12  7  2 

24  5  3 

(Station  10) 

1  693 

3837 

\  280 

260  1 

20  1  6 

(13%) 

Teat  Case 

Test  Case 

7 

Test  Case 

0 

Test  Case 

Test  Case 

1 U 

Big  Sandy 

1  Q  0  Q 

1  ooo 

1  T  T  Q 

1 J  J  y 

DID 

1  n  0  4 
i  u  y  4 

Creek 

102B 

1296 

S96 

1137 

682 

\0UaT-10n  y) 

15  3  1 

4  7  9 

8  3  5 

64  0 

\    ±  o  -b  f 

Test  Case 

11 

Test  Case 

12 

Test  Cast 

13 

Test  Case 

14 

Test  Case 

15 

Sabine  River 

12650 

18339 

6103 

14583 

7761 

12029 

Near 

11070 

1  7868 

6397 

1360? 

6384 

11227 

Gladewater 

8408 

1  1  677 

3935 

6384 

4357 

(-  7%) 

(Station  2) 

Totaled 

15830 

24423 

8052 

19293 

9863 

Flows 

14769 

23630 

8081 

1  7  309 

8338 

(Percent 

(-  7%) 

(-  3%) 

(+  4%) 

(-10%) 

(-15%) 

*  Figure  4  illustrates  the  detailed  results  of  Test  Case  6. 


LEGEND 

2197     Historical  Monthly  Mean 

2671  Filled-In  Monthly  Mean 
1693     Standard  Error 

Figure  5.— Results  of  moss  test  cases  (all  flows  are  monthly  averages  in  tens  of  acre-feet). 


In  addition  to  the  percent  deviation  type  of 
information,  the  standard  error  was  used  during  the 
process  of  refining  MOSS  as  an  indicator  of  the 
improvement  of  the  fill-in  technique.  (The  lower  the 
standard  error  the  better  the  fill-in  technique.) 
Since  the  start  of  the  model  refinement  process,  the 
standard  error  on  this  set  of  test  cases  has  been 
reduced  by  approximately  40  percent  — a  very 
significant  improvement.  The  most  notable  improve- 
ment was  in  the  test  cases  where  the  original 
standard  errors  were  the  greatest.  As  a  result  of  the 
fill-in  improvements,  some  small  increases  (perhaps 
not  statistically  significant)  occurred  in  several  of 
the  test  cases  where  the  original  standard  errors 
were  the  smallest. 

Conclusions 

Some  of  the  problems  of  describing  statistically 
the  physical  system  with  a  limited  data  base  have 


been  discussed  and  analyzed.  Solutions  for  the 
specific  problems  of  excessively  large  estimation  j 
of  single  points,  large  and  small  sequence  means,  { 
and  persistence  of  critical  drought  periods  have 
been  presented.  The  tests  used  in  analyzing  filled- 
in  data  sequences  indicate  that  considerable 
improvements  have  been  made  in  the  fill-in  process. 

The  selected  stations  in  the  Sabine  River  Basin  { 
have  served  as  an  initial  test  for  forming  a  regional  \ 
data  base.  The  data  fill-in  techniques  have  been  j 
tested  using  the  complete  river  basin  network  and  i 
a  basin  subnetwork.  The  subnetwork  analysis  has 
shown  that  for  varying  hydrologic  conditions  MOSS 
adequately  estimates  missing  streamflow  data  for 
regional  planning,  design,  and  operation  studies. 

Although  the  example  problem  is  from  the  Sabine  i 
River  Basin  of  Texas,  the  procedures  described 
herein  are  important  in  any  region  being  studied. 
The  techniques  and  methodology  discussed  are 
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being  used  to  develop  a  regional  stochastic  data 
base.  However,  each  subregion  requires  individual 
assessment  of  the  effectiveness  and  adequacy  of 
MOSS  to  provide  the  required  fill-in  service.  If  the 
model  is  found  to  be  inadequate,  further  refinements 
may  be  defined  and  implemented  before  acceptance 
and  use  of  either  a  filled-in  or  stochastically  gen- 
erated data  base. 


In  final  appraisal,  stochastic  estimation  of  miss- 
ing streamflow  data  is  significant  and  adequate  to 
support  regional  planning  activities.  However,  one 
must  emphasize  that  an  understanding  of  the 
objectives  of  the  area  being  studied  and  the  use  of 
the  filled-in  or  stochastically  generated  data  is 
important  in  resolving  the  many  potential  pitfalls 
of  improper  problem  formulation. 


STATISTICAL  MODEL  OF  SHORT-DURATION  PRECIPITATION  EVENTS 


By  T.  A.  Austin  and  B.  J.  Claborn^ 


Abstract 

A  mathematical  model  of  short-duration  precipita- 
tion events  is  presented.  The  model  assumes  these 
short-duration  events  are  random  in  nature  and  they 
can  be  described  by  a  set  of  stationary  probability 
distribution  functions.  The  model  describes  the 
storm  depth,  duration,  and  the  time  interval  since 
the  occurrence  of  the  last  storm  event.  In  addition, 
a  procedure  for  distributing  the  rainfall  depth  during 
the  storm  is  proposed.  This  model  is  useful  in  provid- 
ing stochastic  inputs  to  rainfall-runoff  models  in 
order  to  evaluate  the  runoff  from  a  variety  of  storm 
and  antecedent  conditions. 

Data  from  Lubbock,  Tex.,  were  used  to  caUbrate 
and  verify  the  model. 


Introduction 

The  effective  management  of  the  water  resources 
of  any  region  is  dependent  upon  the  abihty  to 
predict  future  water  supplies,  either  rainfall  or 
runoff,  and  to  implement  the  resource  develop- 
ments which  maximized  economic  returns  from 
these  future  supplies.  A  statistical  model  of  short- 
duration  precipitation  events  is  presented  which  is 
capable  of  producing  long  sequences  of  events  to 
serve  as  input  to  a  rainfall-runoff  model.  Integration 
of  these  sequences  into  a  systematic  analysis  of  the 
resource  system  will  lead  to  improved  planning  and 
implementation  of  resource  developments  and 
movement  toward  the  economic  optimum  resource 
allocation. 

In  many  arid  and  semiarid  areas,  the  precipita- 
tion regimen  is  often  dominated  by  short-duration 
convective  events  resulting  from  the  rapid  vertical 
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movement  of  unstable  air  masses.  The  available 
moisture  in  the  atmosphere  in  these  environments 
is  generally  low,  and  as  a  result,  the  majority  of 
these  events  are  low  volume.  However,  when  ade- 
quate moisture  is  available,  these  short-duration 
events  can  be  large  volume,  violent  storms  resulting 
in  major  flooding  of  flat,  low  lying  areas.  In  these 
areas,  this  model  provides  a  useful  tool  in  describing 
the  precipitation  phenomenon. 

The  Precipitation  Model 

The  model  was  developed  to  generate  short- 
duration  precipitation  sequences  which  can  serve 
as  direct  input  to  a  rainfall-runoff  model.  The  model 
produces  a  time  series  of  precipitation  events  by 
separately  generating  the  storm  characteristics 
(depth,  duration,  and  the  time  since  the  last  storm) 
and  the  time  distribution  of  rainfall  during  the  storm. 

The  model  was  developed  with  the  hypothesis 
that  precipitation  is  a  random  process  governed  by 
the  laws  of  probabiUty  and  chance.  In  addition,  it 
is  assumed  that  the  probabilities  associated  with 
the  precipitation  process  are  stationary,  that  is, 
the  probabihties  do  not  change  with  time.  If  the 
stationarity  assumption  is  valid,  the  probability 
density  functions  governing  the  precipitation 
process  can  be  approximated  by  histograms  of  the 
historical  data.  In  general,  as  the  number  of  his- 
torical data  points  increases,  the  histogram  will 
better  approximate  the  true  density  function  with 
the  two  coinciding  in  the  limit.  The  method  em- 
ployed in  this  study  to  measure  the  goodness-of-fit 
of  a  set  of  data  to  an  hypothesized  density  function 
is  the  Kolmogorov-Smirnov  goodness-of-fit  test 
{2,3,  7). 

The  usefulness  of  the  data  generated  by  this 
model  lies  in  the  ease  with  which  they  can  be  in- 
porated  into  a  rainfall-funofif  model.  In  order  to 
improve  the  integration  of  the  model's  sequences, 
a  second  phase  is  incorporated  which  distributes^ 
the  rainfall  depth  during  the  storm  in  terms  of  al 
series  of  short-time  interval  storm  intensities.  £ 
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Details  of  the  Model 

The  operations  of  the  model  can  be  classified 
into  two  broad  categories:  (1)  Generation  of  storm 
properties  (storm  depth,  duration,  and  the  time 
interval  since  the  last  storm)  and  (2)  distribution  of 
the  rainfall  depth  as  a  sequence  of  short-time- 
interval  storm  intensities. 

Storm  Properties 

The  model  was  developed  with  the  hypothesis 
that  the  probability  of  occurrence  of  short-duration 
rainfall  depths  and  durations  is  governed  by  a  bi- 
variate  density  function.  This  assumption  implies 
a  statistical  dependence  between  the  depth  and 
duration  of  rainfall. 

The  shape  of  a  bivariate  joint  density  function  is, 
in  itself,  difficult  to  define  from  the  historical  data. 
Thus,  the  marginal  and  conditional  density  func- 
tions are  used  to  define  the  joint  density  function 
since  the  marginal  and  conditional  density  function 
can  be  defined  more  easily  from  the  data.  The  joint, 
marginal,  and  conditional  density  functions  are 
related  by: 


gix\y) 


_fix,  y) 
hiy) 


(1) 


where 

/(x,  y)  is  the  bivariate  joint  density  function 
of  the  random  variables  x  and  y, 

h{y)  is  the  marginal  density  function  of  the 
random  variable  y,  and 

gix\y)  is  the  conditional  density  function  of 
the  random  variable  x  assuming  the 
random  variable  y  is  fixed  at  some 
predetermined  value. 

Once  the  joint,  marginal,  and  conditional  density 
functions  are  defined,  the  generation  process  can 
begin. 

The  generation  procedure  is  begun  by  obtaining 
a  random  sample  from  the  cumulative  marginal 
density  function  of  one  of  the  random  variables  — 
for  example,  storm  duration  — by  using  Monte 
Carlo  simulation.  With  the  value  of  storm  duration 
known,  a  random  sample  representing  the  storm 
volume,  can  be  determined  from  the  conditional 
density  function  of  storm  volume  assuming  the 
storm  duration  is  fixed  at  the  previously  generated 
value. 


This  method  can  be  used  to  generate  storm  vol- 
umes and  durations  regardless  of  the  shape  of  the 
joint  density  function.  The  procedure  is  simplified, 
however,  if  the  variables  are  from  a  bivariate 
normal  density  function.  If  the  joint  density  func- 
tion is  bivariate  normal,  the  marginal  and  condi- 
tional density  functions  will  be  univariate  normal. 
Morrison  has  shown  that  the  conditional  distribu- 
tion function  of  a  bivariate  normal  population  with 
one  variate  constant  is  univariate  normal  with  mean 
and  variance  given  by  (3): 


-2  =  0-,  Vl.O-r^ 


where 


(2) 


(3) 


is   the   mean   of  the  conditional 
distribution  function, 
(Ti-z  is  the  standard  deviation  of  the 

conditional  distribution  function, 
X\  and  X>    are  the  means  of  the  random  vari- 
ables, respectively, 
CTi  and  (T-2    are  the  standard  deviations  of  the 
random     variables,  respectively, 
and 

r  is  the  simple  correlation  coefficient 

between  the  two  variables. 

The  time  between  short-duration  precipitation 
events  is  assumed  to  be  governed  by  a  univariate 
density  function  which  impHes  it  is  statistically 
independent  of  rainfall  volume  and  duration. 
Monte  Carlo  simulation  is  used  to  obtain  a  random 
sample  from  the  cumulative  density  function  govern- 
ing the  time  between  events. 

Time  between  precipitation  events  may  be 
dependent  upon  the  season  of  the  year  in  areas 
where  two  or  more  types  of  precipitation  occur, 
thus  requiring  more  than  one  probabiUly  distribu- 
tion to  describe  the  time  between  these  precipita- 
tion events.  For  example,  in  an  area  where  the 
precipitation  regimen  is  composed  of  heavy  winter 
snows  and  summer  thunderstorms,  two  difterent 
univariate  probability  distributions  describing  the 
time  between  precipitation  events  may  be  required. 

Time  Distrihiition  of  Rainfall  \\  itliin 
the  Storm 

The  time  distribution  ot  raintal!  wiltiin  a  storm  is 
critical  in  predicting  the  runoiV  reginien  since  most 
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sophisticated  runoff  models  require  the  input  of 
short-time-interval  rainfall  intensities.  This  model 
distributes  the  storm  depth  as  a  series  of  short- 
time-interval  rainfall  intensities.  Monte  Carlo 
simulation  is  used  to  obtain  random  samples  from 
the  probability  density  function  governing  the 
occurrence  of  the  short-time-interval  intensities. 

A  statistical  model  must  maintain  all  statistical 
properties  observed  in  the  historical  data;  there- 
fore, the  serial  correlation  of  various  lags  between 
the  short-time-interval  rainfall  intensities,  in  addi- 
tion to  the  first  and  second  moments,  must  be  main- 
tained. If  a  process  is  a  truly  random  process,  the 
serial  correlation  coefficients  of  all  lags  are  zero. 
HoYsfever,  since  the  sample  size  used  in  determin- 
ing the  serial  correlation  coefficient  is  generally 
small  and  the  data  are  subject  to  sampling  errors, 
the  serial  correlation  coefficient  may  differ  from 
zero,  even  for  a  pure  random  process.  The  model 
used  to  distribute  the  rainfall  volume  during  the 
storm  assumes  that  the  process  governing  the  time 
distribution  of  rainfall  is  purely  random;  therefore, 
no  significant  serial  correlation  exists. 

The  time  distribution  phase  determines  the 
number  of  short-time-interval  storm  intensities 
required  to  maintain  a  given  duration.  If  required, 
the  first  time  interval  is  adjusted  to  insure  that 
the  exact  duration  is  always  preserved.  A  series  of 
simulated  storm  intensities  is  then  generated  by 
obtaining  random  samples  from  the  probability 
density  function  governing  the  short-time-interval 
storm  intensities.  The  simulated  storm  depth  is 
then  calculated  and  compared  to  the  previously 
generated  depth.  If  this  simulated  depth  does  not 
agree  with  the  previously  generated  depth,  the 
storm  intensities  are  hnearly  adjusted  to  insure 
that  the  proper  storm  depth  is  always  preserved. 

Application  of  the  Precipitation 
Model 

The  precipitation  model  was  developed  to  provide 
long  sequences  of  precipitation  events  for  input  to 
a  simulation  model  of  the  quantity  and  quality  of 
urban  runoff  from  an  urban  basin  in  Lubbock,  Tex. 

Three  types  of  precipitation  events  characterize 
the  rainfall  of  the  Texas  High  Plains  (5).  The  first 
type  is  the  intermittent  or  continuous  precipitation 
events  accompanied  by  a  continuous  cloud  cover. 


These  events  are  associated  with  the  slow  upward 
movement  of  large  airmasses  resulting  from  colder 
air  transported  into  the  area  by  frontal  activity. 
The  second  type  is  the  convective  thunderstorm 
resulting  from  either  the  rapid  rise  of  small  un- 
stable airmasses  or  squall  lines  preceding  frontal 
activity.  These  events  can  be  violent,  high  intensity 
storms,  but  they  are  normally  of  short  duration. 
The  third  type  is  the  slow  drizzle  associated  with 
stable  atmospheric  conditions  (5). 

The  mean  annual  precipitation  for  Lubbock, 
Tex.,  is  approximately  18  inches,  but  the  time  dis- 
tribution is  such  that  more  than  80  percent  of  the 
annual  precipitation  occurs  in  a  7-month  period 
between  April  and  October.  The  heaviest  rainfall 
amounts  occur  in  April,  May,  June,  and  September 
(see  fig.  1)  and  are  largely  the  result  of  convective 
thunderstorm  events  (5).  Summer  rainfalls  are 
generally  associated  with  convective  activity  re- 
sulting from  the  upward  movement  of  small  air- 
masses which  have  been  heated  during  the  dayhght 
hours  (5).  The  available  moisture  in  the  atmos- 
phere during  the  summer  months  is  generally  low, 
and,  as  a  result,  most  of  these  summer  events  are 
widely  scattered,  fight  showers  during  the  late 
evening  hours;  however,  when  adequate  moisture 
is  available,  these  events  can  be  high-intensity, 
violent  thunderstorms.  These  high-intensity  con- 
vective events  have  historically  been  associated 
with  major  flooding  in  the  low-lying  areas  of  the 
High  Plains. 

Data  on  the  occurrence  of  precipitation  in 
Lubbock,  Tex.,  were  obtained  from  the  local 
National  Weather  Service  Office.  This  station  has 
been  collecting  precipitation  data  since  1911,  but 
continuous  records  of  storm  intensities  began  in 
September  1957,  with  the  installation  of  a  weighing- 
bucket  rain  gage.  Data  on  storm  date,  type  of 
precipitation,  rainfall  depth,  and  the  beginning  and 
ending  times  of  each  individual  storm  whose  depth 
was  greater  than  0.10  inches,  were  obtained  from 
the  daily  weather  observation  sheets  from  Septem- 
ber 1957  to  September  1970. 

During  the  study  period,  480  individual  storms 
were  recorded.  These  included  some  snow  events 
as  well  as  long-duration  frontal  events.  Emphasis 
in  this  study  was  on  short  duration;  thus, 
all  snow  events  and  long-duration  frontal  events 
were  efiminated.  For  this  study,  all  storms  of  less 
than  6  hours  duration  were  classified  as  short 
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Figure  1.  — Monthly  distribution  of  rainfall  for  Lubbock,  Tex. 


duration.  After  elimination  of  all  long-duration 
events,  the  set  of  data  contained  a  total  of  357 
events. 

The  statistical  properties  of  the  rainfall  depth, 
duration,  and  time  since  the  last  event  are  sum- 
marized in  table  1.  The  maximum  recorded  rainfall 
depth  was  3.25  inches.  By  definition,  the  maximum 
allowable  duration  was  6  hours.  As  expected,  the 
time  since  the  last  event  demonstrated  a  wide 
variation  with  the  maximum  (4.359.6  hours  or  181.6 
days)  being  recorded  during  an  extended  drought 
period  beginning  in  early  September  1966  and 
ending  in  mid-March  1967. 

The  coefficient  of  variation  is  a  statistic  used  to 
compare  the  relative  variabiHty  of  several  sets  of 
data.  Rainfall  depth  and  time  between  events  have 
coefficients  of  variation  greater  than  one  indicating 
a  large  relative  variation  in  the  data.  On  the  other 


hand,  the  storm  duration  has  a  much  smaller  relative 
variation. 

In  connection  with  a  recent  matching  fund 
research  project  between  the  Office  of  Water 
Resources  Research  and  Texas  Tech  University, 
three  automatic  recording  rain  gages  have  been 
installed  in  the  study  area.  These  gages  are  designed 
to  start  automatically  when  0.01  inch  of  rainfall  has 
occurred  and  to  provide  a  continuous  time  distribu- 
tion of  the  rainfall.  These  gages  provided  rainfall 
mass  curves  from  which  the  4-minute  siorin  in- 
tensities can  be  determined. 

Regression  and  correlation  analyses  were  per- 
formed on  the  rainfall  depth  and  duration,  and  the 
correlation  coefficient  was  determined  to  be  0.409. 
The  hypothesis  that  the  slope  of  tlie  regression  Une 
was  zero  — or  rainfall  depth  is  hnearly  independent 
of  rainfall  duration  — was  tested  using  the  appro- 


Table  \.— Statistical  data  on  precipitation  for  Lubbock,  Tex. 


Parameter 

Mean 

Standard 

Third 

Fourth 

Coefficient 

Mini- 

Maximum 

deviation 

moment 

moment 

of  variation 



mum 

1  , 



0.37.S 

0.392 

0.188 

0.3% 

1.045 

•0.10 

3.25 

1.826 

1.092 

0.466 

2.714 

0.117 

■  0  tXW 

TSLE'  (hours)  

255.464 

478.361 

5.06  X  10" 

1.66X  10'- 

1.873 

0.107 

4.i5«*.574 

- 

'  TSLE  is  the  time  since  the  last  convective  event  whose  depth  was  jirealer  than  0.10  inches. 
■  Arbitrary  limits  set  on  the  data. 
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priate  t-test  and  rejected  at  the  95-percent  signifi- 
cance level.  Although  independence  imphes  zero 
correlation,  the  reverse  is  not  always  true  since  the 
correlation  coefficient  measures  only  the  linear 
relationship. 

Covariance  is  another  measure  of  the  relationship 
between  variables.  Once  again,  independence 
implies  zero  covariance,  but  the  reverse  is  not 
always  true.  The  covariance  between  the  rainfall 
depth  and  duration  was  1.09.  The  conclusion  from 
both  the  correlation  coefficient  and  the  covariance 
was  that  a  significant  relationship  between  depth 
and  duration  of  rainfall  existed;  thus,  the  two 
variables  are  not  independent.  Similar  correlation 
and  regression  analyses  using  the  time  between 
events  and  rainfall  depth  and  the  time  between 
events  and  rainfall  duration  resulted  in  acceptance 
of  the  hypothesis  that  the  time  between  events 
was  linearly  independent  of  depth  and  duration  of 
the  storm.  Table  2  shows  the  test  statistics  from 
these  analyses.  As  a  result  of  the  acceptance  of  the 
hypothesis  of  linear  independence  between  the 
time  since  last  event  and  the  depth  and  duration  of 
rainfcdl,  a  univariate  model  is  appropriate.  On  the 
other  hand,  a  bivariate  model  is  appropriate  for  the 
depth  and  duration  of  rainfall. 

The  Kolmogorov-Smirnov  test  is  used  to  deter- 
mine if  a  set  of  data  appears  to  be  governed  by 
the  assumed  density  function.  Six  continuous 
density  functions  were  used  in  this  study:  Normal 
distribution,  exponential  distribution,  log-normal 
distribution,  gamma,  extreme  value  type  I  (Gumbel) 
distribution,  and  extreme  value  type  I-A  (smallest 
value)  distribution.  The  form  of  the  density  functions 
and  the  equations  used  for  the  maximum  likelihood 


estimators  of  the  parameters  of  the  distributions 
were  taken  from  Hahn  and  Shapiro  (4,  p.  122-134). 

In  this  model,  the  data  for  rainfall  depth  was 
tested  for  goodness-of-fit  to  the  six  distributions, 
independent  of  rainfall  duration,  in  order  to  estimate 
the  marginal  density  function  of  rainfall  depth.  A 
similar  analysis  was  conducted  for  rainfall  dura- 
tion, to  estimate  the  marginal  density  function  of 
rainfall  duration.  As  a  result  of  these  analyses,  the 
logarithms  of  the  rainfall  depth  and  duration  were 
determined  to  "best  fit"  the  normal  density  func- 
tion. Table  3  shows  the  results  of  these  analyses. 

The  time  between  events  was  also  tested  for 
goodness-of-fit  using  the  six  assumed  density 
functions.  The  best  fit  was  obtained  by  fitting  the 
cube  root  of  the  time  between  events  to  a  gamma 
distribution  function.  Table  4  shows  the  results  of 
this  analysis. 

Operations  of  the  Precipitation  Model 

The  precipitation  model  uses  the  Monte  Carlo 
simulation  method  to  obtain  random  samples  from 
the  probability  density  functions  of  rainfall  depth, 
duration,  and  time  between  events.  The  depth  and 
duration  of  rainfall  were  determined  to  be  governed 
by  a  bivariate  log-normal  distribution  and  the  time 
between  events  was  determined  to  be  governed  by 
a  cube-root  gamma  distribution.  The  parameters 
of  the  density  functions  are  shown  in  tables  3 
and  4. 

The  calculations  are  begun  by  generating  a 
random  digit  used  to  determine  the  rainfall  duration. 
The  normalized  variate  corresponding  to  the 
generated  probability  level  (random  digit)  can  be*; 


Table  2.  —  Regression  analyses  of  precipitation  events 


Variables 

Correlation 
coefficient 
(«) 

Standard  error 
of  regression 
coefficient 

Computed 
t  value 

Tabulated  t 

value 
95%  level 

0.409 
.102 
.018 

0.00630 
.00006 
.00044 

'9.281 
^1.128 
-.3710** 

1.650 
- 1.650 
-1.650 

TSLE^  vs.  depth  

TSLE  vs.  duration  

'  Reject  hypothesis  of  linear  independence  at  95-percent  level. 
^TSLE  =  Tinie  since  last  event. 

^  Accept  hypothesis  of  linear  independence  at  95-percent  level. 
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Table  3.  — Goodness  of  Jit  for  rainfall  depth  and  duration 


Variable 

Distribution 

Mean 

Standard 
deviation 

Computed 
test 
statistic 

Tabulated 

Depth  (inches)  

Duration  (hours)  

Log-normal  

Log-normal  

-L3193 
.3692 

0.7692 
.7532 

2  0.0935 
-  .0764 

0.0971 
.0971 

'Tabulated  in  (7,  p.  560). 

^Significant  at  the  99-percent  level  with  n  =  282. 


Table  4.— Goodness  of  fit  for  time  between  events 


Variable 

Distribution 

Scale 
parameter 

Shape 
parameter 

Computed 
statistic 

Tabulated 
statistic  ' 

TSLE  2  (hours)  

Gamma  

0.6336 

3.1369 

^0.534 

0.0971 

'  Tabulated  in  (7,  p.  560). 

'■^  TSLE  =  Time  since  last  event. 

^Significant  at  the  99-percent  level  with  n  =  282. 


determined  from  the  asymptotic  equation  (1): 
Xp^t- 

2.515517  +  0.802853f  +  0.010328f  ^ 
l  +  1.432788f  +  0.189269f2  +  0.001308«3  '  "  '  ' 

(4) 

where 

Xp  is  the  value  of  the  random  variable  at  the 
probability  level,  p, 

and 

f=  Vlog  Hip')  0<p^0.5. 

This  normalized  variate  has  zero  mean  and  unit 
variance  and,  therefore,  has  to  be  adjusted  by 
multiplying  the  variate  by  the  observed  variance 
and  adding  the  observed  mean.  The  value  generated 
is  the  logarithm  of  the  rainfall  duration;  thus,  the 
duration  is  determined  by  taking  the  antilogarithm. 

The  duration  was  generated  from  the  marginal 
density  function,  independent  of  rainfall  depth. 
Since  depth  and  duration  are  bivariate  log-normally 
distributed,  the  depth  has  to  be  generated  from 


the  conditional  density  function  assuming  the  value 
of  the  rainfall  duration  is  given. 

A  second  random  digit  is  generated  and  the 
normalized  variate  determined  using  equation  4. 
Equations  2  and  3  give  the  mean  and  standard 
deviation  of  the  conditional  density  function  and 
are  used  to  adjust  this  normalized  variate.  The 
antilogarithm  is  obtained  to  determine  the  rainfall 
depth. 

Generation  of  time  between  events  follows  the 
same  general  methodology,  beginning  with  the 
generation  of  a  random  digit.  The  value  of  the 
variate  corresponding  to  the  generated  probability 
level  (random  digit)  is  determined  by  numerical 
integration  of  the  gamma  density  function  to  the 
point  where  the  area  under  the  density  curve  equals 
the  generated  probability  level.  Time  between 
events  is  assumed  to  be  cube-root  gamma  distrib- 
uted. Therefore,  the  variate  must  be  cubed  in  order 
to  obtain  the  time  between  events. 

It  was  assumed  that  the  4-minute  storm  intensi- 
ties occurred  in  a  random  manner  within  a  storm. 
The  observed  intensities  were  tested  for  goodness- 
oi-fit  using  the  six  continuous  density  functions 
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previously  outlined  and  found  to  best  fit  an  expo- 
nential density  function  with  the  parameter  equal 
to  9.0. 

The  distribution  model  determines  the  number  of 
4-minute  intensities  required  to  maintain  the  storm's 
duration.  The  first  interval  is  adjusted  to  insure  that 
the  proper  duration  is  alv^ays  maintained.  Random 
digits,  drawn  from  an  exponential  distribution,  are 
generated  for  each  interval  to  represent  a  simulated 
set  of  storm  intensities.  The  total  depth  of  rainfall 
occurring  in  this  simulated  storm  is  then  computed 
by  multiplying  each  time  interval  by  its  simulated 
storm  intensity  and  summing  over  the  duration.  If 
required,  this  simulated  depth  is  adjusted  to  insure 
that  the  proper  depth  is  preserved.  These  4-minute 
storm  intensities  are  then  available  as  input  to  a 
runoff  model. 

Evaluating  the  Precipitation  Model 

Stochastic  generation  of  sequential  hydrologic 
data  is  based  on  the  population  distribution  and 
parameters  estimated  using  historical  data.  It  is 
imperative,  therefore,  that  the  statistical  parameters 
of  the  generated  data  be  the  same  as  those  observed 
in  the  historical  data.  Some  error  may  be  introduced 
if  some  of  the  random  variables  deviate  from  the 
statistical  distributions  used  in  the  generation 
process.  In  addition,  the  generated  data,  since  they 
are  generally  of  no  better  quality  than  the  historical 
data,  may  be  biased  by  sampling  and  measurement 
errors  inherent  in  the  original  data. 

The  test  of  reliability  of  a  particular  model  must 
be  based  on  the  model's  ability  to  reproduce  the 


statistical  moments  of  the  historical  data.  The 
t-test,  for  determining  whether  or  not  a  sample 
drawn  from  a  normal  population  has  a  mean  equal 
to  some  predetermined  value,  was  used  to  evaluate 
the  model's  ability  to  maintain  the  mean  values  of 
the  marginal  distributions  of  rainfall  depth  and 
duration  (table  5).  It  was  concluded  that  the  means 
of  the  generated  depths  and  durations  were  indis- 
tinguishable from  the  population  means  at  the 
5-percent  significance  level.  A  chi  square  test,  for 
determining  if  a  sample  drawn  from  a  normal 
population  maintains  a  predetermined  value  x>f 
the  variance,  was  used  to  evaluate  the  model's 
ability  to  maintain  the  variance  of  the  marginal 
distributions  of  rainfall  depth  and  duration  (table 
5).  It  was  concluded  that  the  variances  of  the  gen- 
erated depths  and  durations  were  maintained  at 
the  5-percent  significance  level. 

In  addition  to  maintaining  the  critical  moments 
of  the  marginal  distributions,  it  is  necessary  to 
maintain  the  mean  vector  and  covariance  matrix 
of  the  bivariate  distribution  of  rainfall  depth  and 
duration.  The  Hotelling  T~  statistic  was  used  to 
test  the  hypothesis  that  the  mean  vector  for  a  gen- 
erated sample  was  equal  to  the  population  mean 
vector  (3 ).  The  mean  vector  of  the  generated  data 
was  indistinguishable  from  the  population  mean 
vector  at  the  5-percent  level  (table  6). 

The  conclusion  drawn  from  this  testing  program 
was  that  the  model  of  precipitation  depth  and  dura- 
tion does  in  fact  maintain  the  first  and  second 
moments  observed  in  the  historical  data.  This  gives 
reliability  to  the  model's  ability  to  reproduce  the 
historical  data. 


Table  5.  —Statistical  comparison  of  recorded  and  simulated  rainfall  depths  and  durations 


Number  of  storms 
generated 


Simulated  logarithm  of  rainfall  depth 


Mean 


Standard 
deviation 


Computed 
t 

statistic  ' 


Computed 
chi-square 
statistic ' 


Simulated  logarithm  of  rainfall  duration 


Mean 


Standard 
deviation 


Computed 
t 

statistic  ' 


25.. 
500 
500 
500 
500 
600 


Inches 
-1.5747 
-1.3140 
—  1.3070 
-1.3313 
-1.2989 
-1.2848 


Inches 
0.8969 
.7577 
.7485 
.7679 
.7517 
.6920 


-1.423 
.156 
.518 

-.035 
.6068 

-.122 


32.5 
484 
472 
497 
476 
494 


Hours 
0.1234 
.3464 
.4080 
.3605 
.4223 
.4155 


Hours 
0.6582 
.7224 
.7205 
.7689 
.7653 
.7244 


-1.867 
-.706 
1.204 
-.025 
1.551 
.156 


'  Significant  at  the  95-percent  level. 
-  Not  significant  at  the  95-percent  level. 
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Table  6.  —Statistical  tests  of  simulated  mean  vector 
of  the  bivariate  distribution  of  rainfall  depth  and 
duration 


Number  of  storms  generated 

F -ratio  ' 

25  

0.719 
.462 
.584 
.513 
.628 
.809 

500  

500  

500  

500  

600  

'  All  values  significant  at  the  95-percent  level. 

^  F-ratio  using  the  Hotelling  P  statistic  (6,  p.  120). 


Serial  correlation  analyses  were  performed  on 
rainfall  depth  and  duration.  Figures  2  and  3  show 
a  comparison  between  the  serial  correlation  co- 
efficients observed  in  the  historical  data  and  those 
observed  in  the  simulated  events.  Little  serial  cor- 
relation was  found  in  either  the  recorded  or  simu- 
lated storms  for  both  rainfall  depth  and  duration. 
A  similar  analysis  was  performed  on  the  recorded 
and  simulated  4-minute  storm  intensities  (fig.  4). 
The  lag  one  serial  correlation  coefficient  observed 
in  the  simulated  intensities  was  significantly  lower 
than  that  found  in  the  historical  data.  The  model 
does  not  appear  to  adequately  maintain  the  lag  one 
serial  correlation  coefficient  of  4-minute  storm 
intensities. 

AU  of  the  tests  discussed  above  require  that  the 
sample  be  drawn  from  a  normal  or  bivariate  normal 


LAGS 

Fk.URE  2.  — Correlograms  of  recorded  and  simulated  rainfall 
depths. 


Figure  3. —  Correlograms  of  recorded  and  simulated  rainfall 
duration. 

population.  The  model  assumes  that  the  logarithms 
of  the  depth  and  duration  are  normally  distributed. 
Thus,  these  tests  are  valid.  The  time  between 
events,  however,  is  assumed  to  be  cube  root  gamma 
distributed.  No  generalization  test  for  distributions, 
other  than  normal,  is  available.  If  the  distribution 
does  not  differ  significantly  from  the  normal,  the 
tests  discussed  above  can  be  applied  without  large 
errors  (7).  Hann  and  Shapiro  (4)  have  demonstrated 
that  the  gamma  distribution  approaches  the  normal 
distribution  as  the  shape  parameter  increases. 
Although  some  error  is  introduced  by  using  these 
tests  to  evaluate  the  time  between  events,  the  con- 
sequences resulting  from  an  error  in  this  judgment 
were  considered  to  be  insignificant.  The  applica- 
tion of  the  t-test  and  chi  square  test  to  the  generated 
time  between  events  resulted  in  acceptance  of  the 
respective  hypotheses  and  a  conclusion  that  the 
critical  statistical  moments  of  the  cube  root  of 
the  time  between  events  are  being  maintained 
(table  7). 

Conclusions 

The  statistical  model  of  short-duration  precipita- 
tion events  presented  in  this  paper  appears  to  be  a 
valid  representation  of  the  occurrence  of  these 
storms.  The  usefulness  of  this  model  is  in  the  inte- 
gration of  the  sequences  of  generated  storms  into 
a  rainfaii-nnuttV  model  thereby  providinsi  simulated 
future  water  supplies  which  can  then  be  used  to 
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Table  7.  —  Statistical  comparison  of  recorded  and  simulated  time  between  events 


Number  of  storms  generated 

Simulated  cube  root  of  time  between  events 

Mean 

Standard 
deviation 

Computed  t 
statistic 

Chi-square 
statistic 

Hours 

Hours 

25 

.3.570 

2.024 

'  3.412 

^  12.5 

500 

4.836 

2.693 

2  0.955 

-'463 

500 

5.008 

3.056 

^0.418 

'596 

500 

4.908 

2.761 

-0.350 

2  486 

500 

4.822 

2.754 

1.045 

2  484 

600 

4.993 

1.867 

2  0.358 

'629 

'  Not  significant  at  the  95-percent  level. 
2  Significant  at  the  95-percent  level. 


unless  there  is  some  intercourse  with  the  "real 
world"  planning  process.  Therefore,  the  model  de- 
veloped in  this  study  is  not  a  significant  contribu- 
tion to  expanding  the  outer  boundaries  of  knowledge 
when  considered  by  itself.  However,  this  work  does 
provide  the  mechanism  for  developing  water  re- 
source planning  techniques  in  areas  where  the 
precipitation  regimen  is  dominated  by  short- 
duration,  convective  rainfall. 

The  statistical  distribution  parameters  estimated 
in  this  study  apply  only  to  the  High  Plains  of  Texas; 
thus,  new  parameters  would  need  to  be  estimated  for 
each  new  area  of  application.  Future  applications  of 
this  technique  could  lead  to  the  delineation  of  homo- 
geneous areas  of  application  and  expansion  of  the 
concept  to  include  frontal  storms. 

The  model  appears  to  adequately  describe  the 
occurrence  of  short-duration  for  Lubbock,  Tex.  The 
means  and  standard  deviations  of  the  generated 
sequences  were  statisticaUy  indistinguishable  from 
the  means  and  standard  deviations  of  the  historical 
data.  In  addition,  the  model  maintained  the  mean 
vector  of  the  bivariate  distribution  of  rainfall  depth 
and  duration.  Little  serial  correlation  was  observed 
in  either  the  observed  or  generated  rainfall  depths 
and  durations.  The  observed  4-minute  storm  in- 
tensities did  demonstrate  some  serial  correlation 
of  lag  one.  The  model,  on  the  other  hand,  assumes 
no  serial  correlation  exists. 

Future  improvements  in  the  model  are  planned. 
These  improvements  include  a  more  realistic  time 
distribution  model  of  4-minute  intensities  and  in- 
corporation of  a  spatial  distribution  model. 


evaluate  various  alternative  water  resource  de- 
velopments. The  generation  of  hydrologic  sequences 
should  not  be  viewed  as  the  end  product.  In  actu- 
ality, the  evaluation  of  alternative  generation 
schemes  is,  in  itself,  an  academic  exercise  in  futility 


1.0-1 


-0.2  - 


Figure  4.  — Correlograms  of  recorded  and  simulated  4-minute 
storm  intensities. 
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RECURRENCE  INTERVALS  OF  ANNUAL  MINIMUM  STREAMFLOWS 

By  E.  S.  Joseph  i 


SUMMARY 

The  paper  proposes  an  analytical  approach  for 
estimating  the  magnitudes  of  annual  droughts  in  a 
stream  corresponding  to  specified  return  periods. 
The  method  is  illustrated  by  a  frequency  analysis 
performed  on  the  data  of  26  streams  in  Missouri. 
With  the  help  of  a  computer  program,  this  method 
yields  the  resuhs  readily  and  overcomes  the  dis- 
advantages inherent  in  graphical  procedures. 

Introduction 

Prerequisite  to  comprehensive  planning  for  de- 
velopment of  water  resources  in  a  river  basin  is  the 
knowledge  of  the  low-flow  characteristics  of  streams 
in  the  basin.  Information  essential  to  planned  de- 
velopment of  water  resources  includes  the  magni- 
tude and  frequency  of  annual  droughts.  Such  infor- 
mation concerning  probable  extremes  can  be 
obtained  by  a  frequency  analysis,  using  past  records 
of  the  watershed.  Owing  to  the  variation  of  stream- 
flow  minimums,  the  analysis  has  to  be  necessarily 
done  in  a  probabilistic  framework. 

Among  the  many  theoretical  probability  distri- 
butions proposed  for  annual  minimum  hydrologjcal 
events,  the  Weibull  model  is  one  of  the  most 
frequently  used.  Chi-square  and  other  goodness- 
of-fit  tests  offer  evidence  to  show  that  for  annual 
droughts  the  type  III  asymptotic  distribution  of 
extremes  gives  a  better  fit  than  log-normal,  square- 
root  normal,  and  normal  distributions  (6). 

Weibull  Distribution 

The  probability  density  and  distribution  functions 
of  a  three-parameter  Weibull  distribution  are  given 
by: 


'  Department  of  Mathematics,  Keystone  Junior  College, 
LaPlume,  Pa. 


(1) 

FW  =  l-exp{-(^)']  (2) 

where  e,  k,  and  v  are  parameters.  The  character- 
istic drought,  V  (the  drought  which  will  be  exceedec 
36.79  percent  of  the  time),  and  the  lower  limit,  e, 
have  the  same  dimension  as  the  variable,  whereas 
the  shape  parameter,  k,  is  dimensionless. 

In  many  engineering  problems,  the  lower  limit, 
e,  of  the  distribution  is  of  particular  importance.  It 
may  be  negative,  zero,  or  positive  depending  upon 
the  physical  conditions  which  govern  the  phenom- 
enon. For  streamflow  minimums,  it  is  nonnegative. 

Estimation  of  Lower  Limit 

The  lower  limit,  e,  of  a  Weibull  distribution  can 
be  estimated  in  many  ways  such  as  method  of 
moments,  method  using  the  distribution  of  smallest 
observed  drought,  and  logarithmic  extremal  prob- 
ability plotting. 

The  method  of  moment  estimates  e  using  sample 
mean,  x,  and  sample  standard  deviation,  s,  from 

e  =  x-5r(l  +  l/yt)/[r(l  +  2/^)-P(l  +  l/^)]'/2 

(3) 

where  k  is  estimated  by  equating  the  sample 
skewness,  V67,  to  the  population  skewness  by  the 
relation 

V6;=[r(n-|)-3r(i+|)r(i+]) 
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+  2P 


-1 


r  1  + 


l)-(-l) 


3/2 


(4) 


Tables  and  charts  prepared  by  Gumbel  (4)  render 
the  calculations  in  equations  3  and  4  fairly  simple. 

The  smallest  drought,  xi,  observed  in  a  sample 
of  size  n  is  a  statistical  variable  whose  probability 
function  is  the  nth  power  of  the  probability  function 
of  the  droughts.  Based  on  this  logic: 


X  +  Xi 


,1/fc 


1 


(5) 


(8) 


This  method  may  also  be  employed  for  a  three- 
parameter  Weibull  distribution  whose  lower  limit 
is  known. 

Moment  estimate  of  k  is  obtained  by  solving 
equation  4.  Moment  estimate  of  the  characteristic 
drought  is 

v  =  x  +  s[i-rii  +  iik)]  [r(i  +  2/yt) 

+  (9) 


where  A-  is  estimated  by  the  relation: 

(l-n-i/MTd  +  l/A) 


X  —  Xi 


[r(i+2M)-r2(i+iM)]i/2 


(6) 


It  is  clear  from  equation  5  that  the  lower  limit  esti- 
mated by  this  method  will  always  be  less  than  the 
smallest  observed  drought  in  the  sample.  A  table 
and  a  graph  prepared  by  Gumbel  (5 )  facilitate  the 
calculations  in  equation  6. 

According  to  theory,  a  plot  of  the  observations 
in  a  sample  from  a  Weibull  distribution  on  log- 
arithmic extremal  probability  paper  tends  to  be 
linear  if  the  lower  limit  of  the  distribution  is  zero. 
If  the  lower  limit  is  not  zero,  the  fit  would  tend  to 
depart  from  a  straight  line.  Thus,  the  lower  limit 
can  be  estimated  by  plotting  values  (x  —  i)  for 
various  i  and  selecting  that  value  of  e  which 
renders  the  fit  nearly  straight  (7). 

Estimation  of  Characteristics  Drought 
and  Shape  Parameter 

The  estimates  of  shape  parameter,  k,  and  char- 
acteristic value,  V,  can  be  obtained  by  different 
methods.  For  a  two-parameter  Weibull  distribution 
whose  lower  limit  is  zero,  the  maximum  likeHhood 
estimates  are 


Return  Periods  of  Drought 

Let  the  magnitude  of  the  drought  whose  recurrence 
interval  is  to  be  determined  be  x  and  let  the  annual 
droughts  X  be  described  by  a  Weibull  distribution 
with  parameters  c,  k,  and  v.  Then, 


P[X  ^  x]=F{x)  =  1 


exp 


(10) 


Equation  10  states  that  the  annual  drought  in 
any  one  year  will  be  equal  to  or  less  than  x  with 
a  probability  Fix).  Over  a  long  period  of  years,  the 
proportion  of  annual  droughts  equal  to  or  less  than 
X  will  be  Fix),  and  the  average  recurrence  interval 
between  such  droughts  will  be: 


Tx  =  1/F  ix)  years. 


(11) 


Therefore,  x  may  be  defined  as  a  T^-year  drouglit 
(Gumbel  1954).  Combining  equations  10  and  11: 


Tx  =  II 


x  =  e+  iv  —  €) 


(12) 


(13) 


Equation  13  establishes  the  relation  between 
magnitudes  and  recurrence  intervals  of  annual 
droughts  whose  distribution  obeys  a  Weibull 
probability  law  with  parameters  e.  A,  and  i.  Equa- 
tion 13  is  consistetit  with  the  fact  that  the  droughts 
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decrease  in  amount  and  increase  in  severity  with 
increasing  return  periods. 

Analysis  of  Droughts 

Knowledge  of  the  areal  availability  of  water  during 
critical  periods  of  dryness  is  paramount  to  many 
water  resource  design  problems.  With  this  in  view, 
the  frequency  characteristics  of  annual  droughts 
at  26  stream  gaging  stations  in  Missouri  were 
studied.  For  the  purpose  of  this  study,  an  annual 
drought  is  to  be  understood  as  the  lowest  mean  dis- 
charge in  cubic  feet  per  second  for  14  consecutive 
days  during  a  climatic  year  beginning  April  1. 


The  first  effort  was  to  estimate  the  lower  limit, 
e.  The  estimated  e  has  to  necessarily  satisfy  the 
condition  0  ^  e  ^  Xi,  where  xi  is  the  smallest 
observed  drought  in  the  sample.  The  lower  limit  was 
estimated  by  the  method  using  the  distribution  of 
the  smallest  observed  drought  and  by  the  method 
of  moments.  Of  the  26  estimates  obtained  by  the 
former  method,  20  satisfied  the  above  condition  in 
contrast  to  two  out  of  26  estimates  obtained  by 
the  latter  method.  The  inadmissible  values  were 
discarded.  For  six  samples,  either  method  yielded 
negative  lower  limits,  and  hence  they  were  treated 
as  zeros.  The  final  estimates  of  the  lower  limit  are 
presented  in  table  1. 


Table  I.  —  Magnitudes  of  annual  droughts  for  indicated  return  periods  in  Missouri 


Name  of  station 

Lower 
limit 
of  € 

Return 
period 

Magnitude  of  drought 

Moment 
estimates 

Likelihood 
estimates 

Chariton  River  at  Novinger  

0 

Years 
10 

Cubic  feet 
per  second 

2.1 

Cubic  feet 
per  second 

1.8 
.4 

50 

.5 

100 

.3 

.2 

Thompson  River  at  Trenton  

0 

10 

8.7 

4.7 

50 

2.8 

1.0 

100 

1.7 

.5 

trrand  Kiver  near  Oallatin  

1.5 

10 

6.7 

5.3 

oU 

2.6 

2.2 

100 

2.1 

1.9 

Grand  River  near  Sumner  

4.7 

10 

39.2 

32.5 

50 

14.3 

11.5 

103 

10.3 

8.4 

Chariton  River  near  Prairie  HiU  

2.9 

10 

7.7 

8.1 

50 

4.0 

4.1 

100 

3.4 

3.5 

Sah  River  near  New  London  

0 

10 

2.9 

2.4 

50 

.6 

.4 

100 

.3 

.2 

Lamine  River  at  Clifton  City  

0 

10 

.5 

.4 

50 

.1 

.0 

100 

.0 

.0 

Spring  River  near  Waco  

3.5 

10 

10.9 

17.7 

50 

4.9 

7.5 

100 

4.2 

5.8 

Tarkio  River  at  Fairfax   

0 

10 

2.2 

1.0 

50 

.4 

.1 

100 

.2 

.0 

Noraway  River  near  Burlington  Jet  

.9 

10 

3.3 

5.2 

50 

1.3 

1.9 

100 

1.1 

1.4 
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Table  1.  —  Magnitudes  of  annual  droughts  for  indicated  return 
periods  in  Missouri  — Continued 


Name  of  station 


Lower 
limit 
of  € 


Return 
period 


Magmtude  of  drought 


Moment 
estimates 


Likelihood 
estimates 


One  Hundred  and  Two  River  near  Maryville. 


Platte  River  near  Agency. 


Locust  Creek  near  Lenneus. 


Cuivre  River  near  Troy. 


Gasconade  River  near  Rich  Fountain. 


Bourbeuse  River  at  Union. 


Meramec  River  near  Eureka. 


Big  River  at  Byrnesville. 


Pomme  de  Terrc  River  at  Hermitage. 


Gasconade  River  at  Jerome. 


Gasconade  River  at  Hazlegreen. 


Gasconade  River  near  Waynesville. 


Little  Piney  Creek  at  Newburg. 


Big  Piney  River  near  Big  Piney. 


Jacks  Fork  at  Eminence. 


James  River  at  Galena. 


279.2 


13.2 


203.2 


33.6 


273.4 


22.4 


49.5 


23.7 


71.1 


63.9 


12.1 


Years 

10 
50 
100 

10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 
10 
50 

100 


Cubic  feet 
per  second 
.4 
.1 
.0 

2.4 
.4 
.2 

1.0 
.3 
.1 

1.4 
.4 
.2 

289.7 
210.2 
279.5 
13.8 
13.3 
13.2 
242.9 
212.2 
208.0 
45.7 
36.6 
35.3 
.5 
.0 
.0 

283.9 
274.5 
273.8 
25.0 
22.7 
22.5 
52.5 
49.7 
49.6 
26.1 
24.2 
24.0 
73.6 
71.4 
71.2 
84.7 
71.8 
69.2 
22.4 
14.0 
13.1 


Cubic  feet 
per  second 
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The  next  effort  was  to  determine  the  likelihood 
estimates  of  the  shape  parameter  and  the  charac- 
teristic value.  For  comparison,  the  moment  esti- 
mates were  also  computed  using  equations  4  and  9. 
Estimation  of  k  and  v  by  the  method  of  maximum 
likehhood  involved  the  following  steps.  (1)  Replace 
Xi  by  {xi  —  k)  in  equations  7  and  8.  (2)  Solve  equa- 
tion 7  for  k.  (A  trial  value  of  0.1  was  used  for  itera- 
tion.) (3)  To  obtain  the  characteristic  drought,  add 
e  to  the  V  computed  in  equation  8.  The  justification 
of  step  3  is  the  fact  that  Vx  =  V(x--^)-\-e.  A  computer 
program  was  employed  to  solve  equation  7  by 
Newton's  method  of  iteration.  The  program  com- 
putes the  shape  parameter,  the  characteristic 
drought,  and  the  drought  magnitudes  of  desired 
recurrence. 

The  magnitudes  of  drought  for  various  return 
periods  were  determined  by  substituting  estimated 
€,  k,  and  v  in  equation  13.  The  10-.  50-,  and  100-year 
droughts  based  on  likelihood  estimates  are  pre- 
sented in  table  1.  For  comparison,  the  values  ob- 
tained by  means  of  moment  estimates  are  also 
shown  in  table  1. 

Conclusions 

The  classical  method  of  moments  yields  estimates 
from  the  first  three  sample  moments.  The  calcula- 
tions are  relatively  simple;  however,  as  Gumbel 
points  out,  the  third  sample  moment  has  a  large 
variance  and,  therefore,  may  not  be  very  reliable 
especially  for  small  samples.  Moreover,  the  moment 
method  does  not  guarantee  the  necessary  condition 
that  the  estimated  lower  limit  has  to  be  smaller 
than  the  smallest  observed  drought  in  the  sample. 
The  method  based  on  the  distribution  of  the  smallest 
observed  drought  overcomes  this  difficulty  and  is 
hence  preferable  to  the  method  of  moments.  In 
this  investigation,  24  out  of  the  26  moment  estimates 
€  did  not  satisfy  the  condition  0  ^  e  ^  X\. 

Many  past  investigators  advocated  logarithmic 
extremal  probability  plotting.  Determination  of 
e  is  done  by  repeatedly  plotting  log(:*;  — e)  for 
various  e  and  selecting  that  value  of  e  which  yields  a 
Hnear  fit.  Return  periods  can  directly  be  read  from 
such  a  plot.  However,  doing  this  by  hand  is  time 
consuming,  and,  moreover,  the  results  may  vary 
from  one  curve  fitter  to  another. 


The  method  of  maximum  likelihood  yields  the  best 
asmptotically  normal  estimate.  A  comparison  of 
the  10, 50,  and  100-year  drought  magnitudes  obtained 
by  the  method  of  maximum  likelihood  with  those 
obtained  by  logarithmic  extremal  probability  plotting 
revealed  close  agreement.  However,  the  calculations 
are  relatively  more  difficult  and  a  computer  is 
needed. 


Notation 

k  shape  parameter  of  a  WeibuU  distribution 

e  lower  limit  of  a  WeibuU  distribution 

V  characteristic  value  of  a  WeibuU  distribu- 
tion 

f{x)  probability  density  function 

F{x)  distribution  function 

x  sample  mean 

5  sample  standard  deviation 

V67  sample  skewness 

r(    )  gamma  function 

n  sample  size 

Xi  smaUest  observed  drought 

T  return  period 
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A  TWO-DISTRIBUTION  METHOD  FOR  FITTING  MIXED  DISTRIBUTIONS 

IN  HYDROLOGY 

By.  K.  P.  Singh  1 


Abstract 

Samples  of  many  hydrologic  variables— such  as 
floods,  runoffs,  and  droughts  — do  not  conform  to 
any  one  standard  distribution.  In  this  paper,  the 
distributions  obtained  from  the  samples  have  been 
considered  as  mixed  distributions,  each  of  which 
can  be  decomposed  into  two  or  more  components. 
An  objective  methodology  has  been  developed  for 
obtaining  the  parameter  estimates  of  two  component 
distributions  constituting  a  mixed  distribution. 
This  methodology  generates  first-order  estimates 
of  means  and  variances  of  the  component  distri- 
butions and  then  refines  them  using  an  iteration 
procedure  that  minimizes  the  sum  of  squared  dif- 
ferences between  the  observed  and  fitted  normal 
deviates.  A  mixture  of  only  two  normal  distributions 
is  shown  to  simulate  satisfactorily  the  observed 
annual  floods  and  monthly  streamflows.  The  magni- 
tudes of  mean  and  variance  obtained  from  the  fitted 
mixed  distributions  are  in  accord  with  the  corre- 
sponding estimates  obtained  from  the  observed 
flood  distributions.  In  the  case  of  monthly  stream- 
flow,  the  parameters  of  the  component  distributions 
possess  annual  cycles. 

Introduction 

Distributions  of  many  hydrologic  variables  — 
such  as  annual  floods,  monthly  streamflows.  low 
flows,  and  droughts— cannot  be  approximated 
satisfactorily  by  any  one  of  the  known  standard 
distributions.  The  choice  of  distribution  is  usually 
guided  not  by  physical  reasoning  but  by  a  com- 
paratively better  fit  to  the  observed  distribution. 
The  preference  for  a  distribution  in  analyzing  an 
observed    hydrologic   distribution   is  conditioned 
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partly  by  the  extent  of  its  general  acceptance  and 
partly  by  its  amenability  to  rigorous  mathematical 
analysis.  The  Water  Resources  Council's  accept- 
ance of  the  log-Pearson  type  III  distribution  as  a 
base  method  for  flood  frequency  analysis  (2)  can 
be  cited  as  an  example. 

Observed  probability  distributions  of  annual 
floods  at  33  gaging  stations  and  of  monthly  runoff's 
for  30  streams  in  Illinois,  plotted  on  log-normal 
probability  paper,  indicate  that  these  distributions 
can  have  concave,  convex,  or  reverse  curvatures 
and  doglegs.  Presently  used  theoretical  distribu- 
tions cannot  simulate  S-curves  or  curves  with 
reverse  curvature,  and  doglegs.  There  is  no  valid 
reason  to  expect  that  a  hydrologic  distribution 
should  follow  one  or  the  other  chosen  statistical 
distribution.  A  new  method  is  needed  to  objectively 
simulate  the  various  distribution  shapes  observed. 

Comphcated  distributions  can  easily  be  thought 
of  as  decomposed  into  a  few  simpler  distributions 
(7)  which  tend  to  become  more  and  more  normal 
as  the  decomposition  proceeds  to  more  elementary 
levels.  Hald  (7)  designated  a  population  formed  by 
combining  two  populations  in  a  given  proportion  as 
a  heterogeneous  population  and  the  resultant  dis- 
tribution as  a  heterogeneous  distribution.  In  this 
paper,  such  populations  and  probability  distribu- 
tions are  termed  "mixed  populations"  and  "mixed 
distributions."  Some  reasons  for  considering  hydro- 
logic  distributions  as  mixed  distributions  are  given 
elsewhere  (4,  9,  10,  12,  13). 

Mixed  Distribution  Model 

Consider  a  mixed  distribution  of  a  variable  x  to 
consist  of  k  normal  distributions  with  means 
variances  cr?,  and  weights  a,,  such  that  the  weights 
are  nonnegative  and  their  sum  is  unity.  The  proba- 
bility function  for  the  resultant  mixed  distribution  is 
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/'U)=X-rfe=  P  exp[-{x'-f.iVI2cTf]dx' 

^  (TiVZTT  J-oo 

(1) 

or 

PU)^  J;  a,P,U).  (2) 

i=l 

Thus,  for  any  given  value  of  x,  the  probability  for  a 
mixed  distribution  is  obtained  by  summing  up  the 
products  of  respective  weights  and  probabilities  of 
the  component  distributions.  In  equation  1,  the 
component  distributions  have  been  considered  as 
normal  distributions,  which  are  widely  used  and 
understood  and  serve  as  a  standard  for  other  dis- 
tributions. However,  equation  2  will  hold  for  other 
distributions  also.  Only  the  right-hand  side  of  equa- 
tion 1  need  be  changed  to  accommodate  the  proper 
distribution  function. 

Decomposition  of  a  mixed  distribution  into  its 
components  requires  solution  of  equation  1  in  con- 
junction with  higher  order  moment  equation^  (3) 
or  an  iteration  procedure  that  minimizes  the  sum 
of  squared  differences  between  the  observed  and 
fitted  normal  deviates.  If  the  method  of  moments 
is  used  in  evaluating  /Xj,  cr,,  and  a;,  the  highest 
order  of  moment  needed  is  given  by  3A:— 1.  When 
k  equals  2,  the  highest  order  is  5.  The  sample  size 
of  hydrologic  data  is  usually  50  or  less.  For  such  a 
sample  size,  even  the  estimate  of  a  third-order 
moment  can  have  considerable  bias  because  the 
method  of  moments  gives  a  greater  weight  to  the 
tail  points,  and  this  weight  increases  geometrically 
with  the  increase  in  the  order  of  moment.  Un- 
fortunately, the  values  at  the  tail  ends  of  a  distri- 
bution are  not  as  good  for  estimating  their  expected 
values  as  are  those  in  the  midrange.  Thus,  a  maxi- 
mum value  of  2  for  k  is  perhaps  the  limit  in  process- 
ing hydrologic  data  for  evaluating  parameters  of  a 
mixed  distribution. 

Cohen  (3)  obtained  a  mixed  sample  by  combin- 
ing two  separate  samples  of  334  and  672,  with  given 
means  and  standard  deviations.  With  the  use  of 
certain  assumptions  to  obtain  first  approximations, 
and  a  number  of  trials  to  achieve  a  solution,  his 
moment  and  minimum  chi-square  estimates  of 
parameters  ^ti,  CTi,  1x2,  (Tz,  and  Ci  vary  considerably 
from  the  given  values  for  the  two  samples.  Cohen 


recommends  the  method  described  in  his  paper  for 
large  samples  only. 

An  objective  method,  designated  as  the  two- 
distribution  method,  has  been  developed  to  obtain 
the  estimates  of  /u.i,  <Ti,  jx-i,  (T2,  and  Oi,  using  an 
iteration  procedure  that  minimizes  the  sum  of 
squared  differences  between  the  observed  and 
fitted  normal  deviates.  The  method  is  free  from 
the  errors  and  uncertainties  that  exist  when  third 
and  higher  order  moments  are  used.  The  procedure 
has  been  programmed  in  FORTRAN  IV  for  the 
IBM  360/75  computer  to  yield  best-fit  estimates  of 
parameters  for  matching  a  mixed  distribution  to 
the  observed  distribution.  The  average  cost  of 
calculating  fXi,  ari,  1x2,  and  (72,  with  value  of  Oi  as 
0.5,  printing  the  results,  and  producing  the 
CALCOMP  plot  of  the  fitted  mixed  distribution 
and  component  distributions  together  with  the  data 
points,  is  $2  for  a  sample  size  of  50.  In  case  Oi  is 
not  known,  the  computer  can  yield  a  best-fit  value 
of  Oi,  but  the  cost  is  about  10  times  the  average 
cost  outlined  above. 

Computer  Program 

Without  delving  into  the  many  problems  and 
decisions  faced  in  developing  the  program,  its 
main  features  and  mode  of  working  are  described 
briefly  in  the  following  steps. 

1.  Data  Reduction.— The  sample  is  ranked  from 
low  to  high.  The  ranked  matrix  is  log-transformed, 
if  so  indicated,  to  yield  x,  and  the  column  vectors 
of  associated  probabilities,  P,  and  standard  normal 
deviates,  Z,  are  added  using  a  P  —  Z  subroutine. 
Any  zero  values  in  the  ranked  matrix  are  not  trans- 
formed and  a  count  is  kept  of  the  number  of  zeroes, 
say  /ii .  The  value  of  P  is  calc ulated  from  P  =  m/  ( n  + 1 ) 
in  which  m  is  the  rank  order  and  n  is  the  sample  size. 

2.  Characteristic  Tracers.— The  sample  is  spht 
into  halves,  and  second-degree  curves  are  fitted  to 
these  halves  using  x  and  Z  coordinates.  Values  of 
X  at  probability  levels  of  0.15,  0.35,  0.65,  and  0.85, 
that  is,  xo.15,  X0.35,  JCo.65,  and  xo.ss,  are  calculated 
from  the  fitted  curves.  These  values  are  designated 
as  characteristic  tracers  for  obtaining  estimates  of 
the  parameters  1x1,  cti,  1x2,  (72,  and  ai. 

3.  First  Estimate  of  Parameters.  —  With  the  as- 
sumption that  fX2  —  cr2  —  0  and  ai  =  0.5,  values  of  Pi 
are  calculated  from  Pi  —  P/ai  and  are  converted  to 
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Z\  values  using  a  P  —  Z  subroutine.  Values  of  /Xi 
and  CTi  are  obtained  from 

CTi  =  (Xn.3r>~:to,15)/(^I(I).35)~^I(0.15))  (3) 

and 

At]  =  a:o.i5— 2'i(o.i5)Cri  (4) 

in  which  subscript  1  refers  to  the  first  component 
distribution.  Using  cti  and  /Xi  from  equations  3  and 
4  and  a  Z-P  subroutine,  Puoe-,)  and  Pkos-,)  are 
calculated  from  the  corresponding  Z  scores 

^KO.fiS)  =  ( JCo.65  —  IJ-l)l(Ti  (5) 

and 

^1(0.85)  =  {Xo.So  —  /ii  )  /O"! .  (6) 

Values  of  P->(o.fi5)  and  P>(o.»5)  are  obtained  using  the 
relation 

P2=(P-a,P,)/(l-a.).  (7) 

If  Pi  is  less  than  0.00001  or  greater  than  0.99999, 
the  program  shifts  to  step  5. 

The  f*2  values  are  converted  to  Z2,  and  (72  and  fi2 
are  calculated  as  follows: 

(T2=  ixo.»;,  —  Xo.tih)  I  {Z2(O.H5)~  Z2(0.S5))  (8) 

and 

/M2  =  Xos5  —  ■Z2(0.65)Cr2.  (9) 

These  estimates  of  0*2  and  fx>  are  utiUzed  in  com- 
puting Z2(o.  15)  and  Z2(o.35)  as  follows: 

^2(0. 15)  =  (Xi).  is  —  fJ-l)  l(T2  (10) 

and 

^2(0.35)=  U0.33  — M2)/0"2,  (11) 

which  are  converted  to  P2  values.  New  values  of 
Pho.15)  and  Puo.as)  are  obtained  from 

P,=  [P-(l-a,)P2]/a,.  (12) 

If  Pi  is  less  than  0.00001  or  greater  than  0.99999, 
the  program  shifts  to  step  5.  The  corresponding  Z 
scores  yield  a  new  estimate  of  (Ji  and  /Ui  using 
equations  3  and  4.  With  the  new  set  of  cti  and  fiy 
values,  the  process  is  repeated  from  equations  5 
through  12  and  back  to  equation  3  until  the  absolute 
difference  between  the  successive  values  of  each 


parameter  becomes  less  than  0.02  or  the  number  of 
iterations  exceeds  10. 

4.  Final  Estimate  of  Parameters.  — The  last  set 
of  (Ti,  /X2.  and  cr2  for  a  given  value  of  Oj  from 
step  3  is  taken  as  the  first  set  (or  old  set)  of  param- 
eter estimates.  These  parameter  estimates  are  now 
applied  to  the  first  half  of  the  sample  allowing  for 
any  ni  in  the  zero  count.  Z2  scores  are  calculated 
from  the  sample 

Z2  =  {x-fJL2)l(X2.  (13) 

These  are  converted  to  P2  using  a  Z  —  P  subroutine, 
and  values  of  P\  are  obtained  from 

Pi={P-{\-ai)P2]lau  (14) 

If  Pi  is  negative  for  any  data  point(s)  in  the  begin- 
ning, the  zero-count  is  increased  accordingly. 
Values  of  Zi  corresponding  to  P\  are  tested  for 
Zi+i—Zi  being  positive.  In  case  it  is  negative,  the 
point  i  is  dropped  and  the  zero  count  increased. 
The  first  half  of  the  sample  allowing  for  the  zero 
count,  A'l,  is  processed  to  compute  IZi,  1x,  1.x\ 
and  SZiX  from  which  new  values  of  cti  and  /Lti  are 
obtained  as  follows: 

(Ti^[N^lx^-  {IxyyiNilZiX-lxlZi)  (15) 
and 

/xi=  (2x-cr,2Z,)/A^i. 

(16) 

These  values  of  cti  and  pii  are  used  to  calculate 
Zi  scores  for  the  second  half  of  the  sample  from 

Zi=  (x-Mi)/o-i  (17) 

which  are  converted  to  Pi  values.  Thus.  P2  values 
are  obtained  from 

P2=(P-a,P.)/(l-a,)  ^8) 

and  tested  for  their  being  not  equal  to  or  greater 
than  1.0.  If  there  is  any  value  equal  to  or  greater 
than  1.0  at  the  end.  the  end  point(s)  will  be  dropped 
from  the  sample.  The  Z*  scores  corresponding  to 
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P2  are  calculated  and  tested  in  a  similar  fashion  as 
the  Zi  scores.  New  values  of  cto  and  1x2  are  calcu- 
lated using  equations  15  and  16  replacing  the  sub- 
script 1  by  2.  The  new  set  of  /J-i,  cti,  Hz,  and  0-2  is 
compared  with  the  previous  set.  If  the  absolute 
difference  between  the  values  of  any  parameter  in 
these  two  sets  exceeds  0.005,  the  program  labels 
the  new  set  as  the  old  set  and  repeats  the  process. 
If  the  absolute  differences  do  not  exceed  the  toler- 
ance hmits,  the  last  set  of  parameters  is  taken  as 
the  final  set.  The  SAZ^  statistic  is  computed  for 
the  sample;  AZ  equals  the  difference  between  the 
Z  from  the  ranked  data  matrix  and  the  Z  correspond- 
ing to  P  from  P=arPi+{l-a,)P.,.  The  SAP^  sta- 
tistic is  also  computed.  If  the  process  has  been 
repeated  10  times  without  achieving  the  tolerance 
limit,  the  program  shifts  to  step  5. 

5.  Final  Estimate  of  Parameters  for  Rejects  from 
Steps  3  and  4.  — When  a  shift  to  step  5  is  indicated, 
the  last  set  of  parameters  yu-i,  cti,  fj.-,,  and  ao  from 
step  3  or  4  is  refined  in  four  iterations.  In  the  first, 
a  0.27  or  30-percent  tolerance,  whichever  is  greater, 
is  allowed  on  the  parameter  estimates,  thus  giving 
three  values— low,  mean,  and  high— for  each  of  the 
four  parameters.  These  are  combined  in  all  possible 
(that  is,  81)  ways.  Values  of  Pi,  P2,  Zi.  and  Z2  are 
tested  in  the  same  manner  as  in  step  4.  Values  of 
2AZ^  (and  SAP'-)  are  computed  for  all  81  sets, 
and  these  sets  are  ranked  in  an  ascending  order  of 
magnitude  of  2AZ^  The  mean  values  of  the  param- 
eters in  the  first  two  sets  yield  a  set  for  the  second 
iteration  in  which  tolerances  are  reduced  to  one- 
third  of  those  for  the  previous  set.  The  procedure  ends 
after  completing  the  fourth  iteration.  Any  more 
iterations  can  change  the  parameter  estimates  at 
most  by  0.5  percent.  Final  SAZ^  and  the  corre- 
sponding XAP-  are  calculated. 

6.  Variation  in  Oi.  — Steps  3  through  5  yield  the 
best  estimate  of  parameters  yu-i,  cti,  yu.2,  and  cr2  for 
a  given  value  of  Oi.  If  the  computer  is  required  to 
yield  a  best-fit  value  of  ai,  these  steps  are  repeated 
for  values  of  c/i  greater  and  smaller  than  the  pre- 
viously used  value.  The  program  computes  an  Oi 
that  will  yield  a  minimum  value  of  XAZ^  by  fitting 
a  second-degree  curve  to  the  sag  portion  of  the  ai 
vs  2AZ^  curve.  This  value  of  «i  is  used  to  compute 
/xi,  (7i,  IJL2,  and  0-2. 

7.  Results  and  Plots.  — The  computer  prints  out 


the  sample  data;  a  ranked  matrix  with  P  and  Z 
vectors;  values  of  fxu  cri,  1x2,  0-2,  gi,  SAZ^.  and 
SAP^;  a  ranked  matrix  of  observed  and  fitted  Z 
and  P  corresponding  to  the  ranked  x;  any  points 
dropped  in  the  beginning  or  end  of  data;  and  a 
CALCOMP  plot  showing  the  data  points,  two  com- 
ponent distributions,  and  the  fitted  mixed 
distribution. 

Application  to  Flood  Distributions 

As  a  test  of  the  suitability  of  the  two-distribution 
method  for  analyzing  flood  distributions,  the  annual 
flood  data  from  streams  in  Illinois  were  analyzed 
in  three  sets  as  given  in  table  1.  The  first  set  com- 
prised 12  available  long-term  (54  years)  records, 
and  the  second  and  third  sets  contained  streams 
with  flood  records  of  28  and  24  years,  respectively. 
The  observed  distribution  parameters  (mean,  x; 
standard  deviation,  5;  and  coefficient  of  skew,  Cj) 
and  the  computed  two-distribution  parameters 
(assuming  ai=^0.5)  are  given  in  table  2  for  aU  33 
flood  series  analyzed. 

For  a  stream,  the  mean  of  the  absolute  deviations 
of  computed  floods  from  the  observed  floods,  as  a 
percent  of  the  mean  annual  flood,  D,  is  obtained 
from 


D 


100  « 
n  2j 


Fc-F„ 


(19) 


in  which  Fc  and  F„  denote  the  computed  and  ob- 
served floods  for  the  same  probability,  P;  and  Fm 
represents  the  mean  annual  flood  and  equals  10-^ 
where  x  is  the  mean  of  log- transformed  floods. 
Floods  were  computed  according  to  the  log-Pearson 
type  III  method,  recommended  by  the  Water  Re- 
sources Council  (15)  and  Prasad  (11),  using  the 
Pearson  standard  deviates  (6,  8,  14).  Values  of  D 
obtained  by  using  the  two-distribution  and  log- 
Pearson  type  III  methods  are  given  in  table  2  and 
plotted  on  log-normal  probability  paper  in  figure  1. 
Not  only  are  the  deviations  lower  in  the  case  of 
the  two-distribution  method,  but  also  the  prob- 
ability of  high  deviations  using  this  method  is  very 
much  less  than  from  the  log-Pearson  type  III 
method.  The  low  deviations  are  synonymous  with 
a  better  fit  over  the  flood  range  studied. 
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To  illustrate  the  type  of  fit  obtained  by  the  two 
methods,  the  actual  flood  data  and  the  fitted  distri- 
butions are  plotted  on  log-normal  probability  paper 
in  figure  2  for  two  streams.  For  the  Embarras  River 
at  Ste.  Marie  and  the  Kaskaskia  River  at  Vandalia, 
respectively,  the  percent  deviations.  D,  are  6.9 
and  8.9  using  the  two-distribution  method,  and  12.6 


and  23.6  using  the  log-Pearson  type  III  method. 
The  observed  floods  are  defined  satisfactorily  by 
curves  with  reverse  curvature,  which  are  easily 
simulated  by  the  mixed  distribution  curves  from  the 
two-distribution  method.  Fits  by  the  log-Pearson 
type  III  method  are  poor  throughout  the  range  of 
observed  data,  and  become  very  poor  in  the  high 


Table  1.  —  Illinois  streams  used  in  study 


Set  and 
stream 
number 


Stream  and  gaging  station 


USGS' 
number 


Drainage 
area 


1.. 
2.. 
3.. 
4.. 
5.. 
6.. 
7.. 
8.. 
9.. 
10. 
11. 
12. 

13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22. 
23. 

24. 
25. 
26. 
27. 
28. 
29. 
30. 
31. 
32. 
33. 


I.  54-year  data  (1915-68): 

Embarras  River  at  Ste.  Marie  

Little  Wabash  River  belov^  Clay  City  

Pecatonica  River  at  Freeport  

Kankakee  River  at  Momence  

Kankakee  River  near  Wilmington  

Des  Plaines  River  at  Riverside  

Fox  River  at  Algonquin  

Fox  River  at  Dayton  

Spoon  River  at  Seville  

Sangamon  River  at  Monticello  

Kaskaskia  River  at  Vandalia  

Big  Muddy  River  at  Plumfield  

II.  28-year  data  (1941-68): 

Embarras  River  at  Ste.  Marie  

North  Fork  Embarras  River  near  Oblong  

Little  Wabash  River  below  Clay  City  

Little  Wabash  River  at  Carmi  

Kishwaukee  River  at  Belvidere  

South  Branch  Kishwaukee  River  near  Fairdale 

Kishwaukee  River  near  Perryville  

Edwards  River  near  Orion  

Edwards  River  near  New  Boston  

Bay  Creek  at  Pittsfield  

Bay  Creek  at  Nebo  

in.  24-year  data  (1945-68): 

Iroquois  River  at  Iroquois  

Iroquois  River  near  Chebanse  

Des  Plaines  River  near  Des  Plaines  

Des  Plaines  River  at  Riverside  

Vermilion  River  at  Pontiac  

Vermilion  River  at  Lowell  

Spoon  River  at  London  Mills  

Spoon  River  at  Seville  

La  Moine  River  at  Colmar  

La  Moine  River  at  Ripley  


5-5250 
5-5260 
5-5290 
5-5325 
5-5545 
5-5555 
5-5695 
5-5700 
5-5845 
5-5850 


Square 
miles 


3 

-3455 

1,513 

3 

-3795 

1.134 

5 

-4355 

1.330 

5 

-5205 

2,340 

5 

-5275 

5,250 

5 

-5325 

635 

5 

-5500 

1,402 

5 

-5525 

2,570 

5 

-5700 

1.600 

5 

-5720 

550 

5 

-5925 

1 .980 

5 

-5970 

785 

3-3455 

1,513 

3-3460 

319 

3-3795 

1.134 

3-3815 

3,111 

5-4385 

525 

5-4395 

386 

5-4400 

1,090 

5-4660 

163 

5-4665 

434 

5-5125 

40 

5-5130 

162 

682 
2,120 
359 
635 
.568 
1 .230 
1.070 
1,600 
655 
1.310 


'  U.S.  Geological  Survey. 


376 


MISCELLANEOUS  PUBLICATION  NO.  1275,  U.S.  DEPARTMENT  OF  AGRICULTURE 


flood  ranges,  which  are  of  great  interest  to  engineers 
and  hydrologists.  The  fit  of  the  computed  floods  with 
the  observed  floods  was  good  to  satisfactory  for  cill 
33  streams  analyzed  with  the  two-distribution 
method,  whereas  the  fit  was  quite  poor  for  12  of  the 
33  streams  and  fair  to  satisfactory  for  the  remaining 


streams  with  the  log-Pearson  type  III  method. 
Therefore,  the  two-distribution  method  based  on 
the  mixed  distribution  concept  not  only  simulates 
satisfactorily  the  observed  flood  distributions  but 
also  provides  good  fit  in  the  range  of  high  floods. 
So  far  equal  weights  have  been  assumed  for  the 


Table  2.  — Distribution  parameters  and  mean  deviations 


Stream  number 


Observed  flood  data 


Two-distribution 
method  (ai  =  a2  =  0.5) 


X 

s 

Ml 

o-i 

M2 

0-2 

LPM> 

TDM  2 

4.105 

0.357 

-1.331 

4.355 

0.201 

3.884 

0.226 

12.6 

6.9 

4.085 

.334 

-.309 

4.303 

.248 

3.862 

.276 

9.5 

8.1 

6. 1 10 

0,10 

f\An 

a  ^Au 

.ZOO 

A  a 

4.y 

3.6 

6.  im 

AAA 

—  .001 

o.HlO 

Q  71A 

.ivo 

o.Z 

—  .  loo 

A  Q90 

A  OQ^ 

.oUl 

0.  / 

4.2 

3.547 

.329 

-2.779 

3.646 

.121 

3.479 

.199 

13.5 

3.9 

3.472 

.194 

-.602 

3.551 

.141 

3.384 

.227 

3.8 

4.0 

OAO 

A  A7A 

7  A 

7  0 

4  056 

0.7  LO 

1 

.  100 

0.0 

3.689 

.302 

—  .336 

3.838 

909 

^  9 

4.094 

.357 

-1.202 

4.119 

.181 

4.092 

.416 

23.6 

8.9 

3.839 

.328 

-.851 

3.937 

.m 

3.742 

.439 

15.5 

7.7 

4.101 

—  1  '^79 

T'.O  /  7 

99^ 

0.0(4' 

9ftd 

ID. 4' 

in  n 

3.805 

.452 

—  1  281 

4  002 

9dn 

0,010 

.OZD 

1  1 

10. 1 

in  0 

4.048 

.370 

—  1 90 

^,000 

^  774 

0.  1  IT- 

.  100 

17  1 

19  ^ 

4.125 

.261 

.072 

4.181 

.414 

4.075 

.107 

13.3 

11.1 

3.503 

.330 

-.068 

3.760 

.240 

3.239 

.245 

15.6 

10.6 

3.526 

—  fiOl 
.\jy  1 

^  71d 

.WO" 

0.000 

117 
11./ 

A.  1 

3.790 

9Qn 

—  991 

d  09=; 

1  77 
.lit 

o.oVj 

91 

10. 0 

ft  1 
0. 1 

3.447 

.246 

-.876 

3.520 

.113 

3.362 

.383 

7.9 

5.8 

3.540 

.250 

-.638 

3.681 

.137 

3.386 

.299 

7.7 

6.1 

3.671 

.300 

-1.144 

3.876 

.163 

3.491 

.233 

7.1 

6.0 

3.825 

.335 

-.973 

4.023 

.166 

3.629 

.356 

9.0 

7.2 

3.529 

.209 

.314 

3.663 

.218 

3.399 

.160 

8.4 

7.2 

4.065 

.169 

.327 

4.077 

.230 

4.052 

.156 

7.1 

7.1 

3.261 

.228 

-.303 

3.384 

.169 

3.129 

.261 

6.9 

5.7 

3.548 

.188 

-.384 

3.584 

.184 

3.520 

.223 

5.7 

5.6 

3.634 

.256 

.255 

3.737 

.335 

3.540 

.196 

8.6 

6.9 

4.066 

.231 

-.035 

4.050 

.308 

4.081 

.202 

7.1 

4.8 

4.028 

.239 

.403 

4.160 

.272 

3.903 

.183 

7.6 

2.8 

4.099 

.187 

-.073 

4.093 

.231 

4.100 

.190 

5.3 

4.7 

3.857 

.280 

-.063 

4.083 

.186 

3.630 

.211 

10.4 

5.8 

3.940 

.204 

-.182 

4.110 

.125 

3.767 

.147 

6.3 

4.8 

Deviations  (O) 


6... 
7... 
8... 
9... 
10. 

11. 
12. 
13. 
14., 
15.. 

16.. 
17.. 
18.. 
19.. 
20.. 

21.. 
22.. 
23.. 
24.. 
25.. 


26. 
27. 
28. 
20. 
30. 

31. 
32. 
33. 


'  LPM=  Log-Pearson  type  III  method. 
'  TDM=  Tv^o-distribution  method. 
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PERCENT  EXCEEDANCE  PROBABILITY 
Figure  l.  — Mean  deviations  using  two-distribution  and  log-Pearson  type  III  methods. 


two  component  distributions.  For  the  12  streams 
with  54-year  records,  runs  were  made  to  obtain 
values  of  oi  as  selected  by  the  computer.  The  fit 
was  slightly  improved  in  the  case  of  streams  4.  5,  8, 
and  12  in  table  1  for  which  the  fitted  value  of  Ui 
ranged  from  0.4  to  0.7.  The  magnitude  of  D  for  these 
four  streams  was  reduced  from  2.9.  4.2.  7.2.  and 
7.7  to  2.6,  3.7,  7.0,  and  6.7,  respectively.  This  shows 


that  the  assumption  of  Ui=0.5  is  satisfacton.-  in 
simulating  annual  Hood  distributions. 

The  mean  (x.  variance  cr-'.  and  coefficient  of  skew 
Cg  of  the  fitted  mixed  distribution  were  computed 
using  the  following  equations  (3): 

H  =  axiii  +  atfi2  (20) 
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cr2  =  aiO-f  +  020^  +  0102  (M.2  —  /X1)  2  (21) 

Cs=  [iaiaiifJii- iX2)i(r\-al) 

+  0102(02-01)  (/iXi-At2)3]/(r».  (22) 

The  computed  values  of  (x,  cr,  and  Cg  are  plotted 
against  x,  s,  and  Cs,  respectively,  in  figure  3.  Fitted 


means  lie  within  ±0.4  percent  of  the  observed 
means.  Fitted  values  of  a  lie  within  ±20  percent  of 
the  observed  standard  deviations.  However,  the 
fitted  coefficients  of  skew  are  not  less  than  —0.85 
whereas  6  of  the  33  Cg  values  are  less  than  —1. 
These  results  point  to  the  inadequacy  of  any  method 
using  third  and  higher  order  moments  to  compute 
parameters  of  the  component  distributions.  The 


PERCENT  NONEXCEEDANCE  PROBABILITY 


2. -Typical  distribution  curves  fitted  by  two-distribution  (TDM)  and  log-Pearson  type  III  (LPM)  methods  to  observed  flood  data. 
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OBSERVED  MEAN,  ^  OBSERVED  STANDARD  DEVIATION,  S  OBSERVED  COEFFICIENT  OF  SKEW,  Q 


Figure  3.  —  Parameters  computed  from  observed  flood  data  and  from  component  distributions  fitted  by  two-distribution  method. 


excellent  fit  of  the  jx  and  x  values,  and  the  reason- 
ably good  fit  of  the  a  and  s  values,  indicate  that  the 
two-distribution  method  developed  for  computing 
the  component  distribution  parameters,  even  with 
01  =  02  =  0.5,  is  not  only  objective  but  also 
satisfactory. 

Application  to  Monthly  Runoff 
Distributions 

The  monthly  streamflows  for  30  streams  in  Illinois 
have  been  analyzed  using  the  two-distribution 
method  (ai  =  0.5).  The  drainage  areas  vary  from 
40  to  3,000  square  miles.  As  an  example,  the  annual 
cycles  of  variations  in  /u,i,  /ao,  cti,  and  a-i,  obtained 
from  the  log-transformed  monthly  streamflows,  in 
inches,  for  four  streams  are  shown  in  figure  4.  The 
streams  are: 


Stream  and  gaging  station 

uses 

Sample 

number 

size 

Years 

South  Branch  Kishwaukee  River  near 

Fairdale  

5-4395 

28 

Henderson  Creek  near  Oquawka  

5-4690 

33 

Salt  Creek  near  Rowell  

5-5785 

25 

Skillet  Fork  at  Wayne  City  

3-3805 

39 

The  shape  of  annual  cycles  is  governed  to  a  large 
extent  by  monthly  rainfall  characteristics  and  soil 
permeabiHty.  The  South  Branch  Kishwaukee  River 
is  located  in  northern  Illinois.  Henderson  Creek  in 
western  Illinois,  Salt  Creek  in  the  central  part,  and 


Skillet  Fork  in  southern  IlUnois.  The  rainfall  gen- 
erally increases  from  north  to  south  and  west  to 
east.  Soils  in  southern  Illinois  are  much  less  per- 
meable than  those  in  the  rest  of  the  State.  This 
partly  explains  the  greater  variation  in  (T\  and  ctz 
from  month  to  month  for  the  Skillet  Fork.  The  fol- 
lowing points  can  be  made  from  figure  4: 

1.  The  annual  cycle  of  /i-i  exhibits  a  minimum 
during  September  or  October,  but  the  low  persists 
from  July  to  January.  For  )Li2,  the  low  generally 
occurs  a  month  earlier  and  persists  from  August 
through  October. 

2.  The  values  of  fi-i  over  the  year  are  much 
higher  than  the  corresponding  ^ti  values  for  the 
streams  analyzed. 

3.  The  values  of  (t-i  are  generally  high  during 
August  through  November,  whereas  the  values  of 
(T\  are  generally  high  during  February  and  from 
May  through  July. 

4.  The  annual  cycles  of  /ii.  it\.  and  o-j  are 
well  defined  to  the  extent  that  they  can  be  smoothed 
to  yield  mean  curves.  The  existence  of  these  annual 
cycles,  which  are  symbolic  of  continuous  phe- 
nomena, indicates  that  the  methodolog>-  developed 
to  simulate  the  mixed  distributions  is  not  only  ob- 
jective but  also  satisfactory. 

5.  Since  sample  means  are  closer  to  population 
means  in  comparison  with  variances,  any  abrupt 
changes  or  seeming  discontinuities  in  annual  cycles, 
because  of  seasonal  changes  or  snowinelt  periods, 
are  prominent  only  in  ct^  and  (Tj  cycles.  In  other 
words,  seasonal  changes  have  a  greater  effect  on 
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the  variability  about  the  mean  than  on  the  mean 
itself  from  month  to  month. 

The  fitted  mixed  distribution,  the  observed 
monthly  runoff  values,  and  the  two  component 
distributions  are  shown  in  figure  5  for  the  South 
Branch  Kishwaukee  River  in  November,  Hender- 
son Creek  in  October,  Salt  Creek  in  March,  and 
Skillet  Fork  in  July.  The  mixed  distributions  fit 
the  data  points  quite  satisfactorily  over  a  wide  range 
of  shapes.  The  fit  may  be  improved  further  by 
letting  the  computer  select  a  suitable  value  of  a, 


rather  than  taking  it  as  0.5  as  was  done  in  the  above 
analyses.  On  the  average,  one  data  point  was 
dropped  from  4  months  of  data  while  testing  for 
the  suitability  of  P  and  Z  values  as  discussed  in  the 
section  of  "Computer  Program." 

The  technique  of  decomposing  a  mixed  distri- 
bution as  described  in  this  paper  can  also  be  ap- 
plied after  subtracting  the  deterministic  component 
from  the  monthly  streamflow.  The  component  and 
mixed  distributions  offer  much  promise  in  the  field 
of  sequential  generation  because  third  and  higher 
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order  moments,  needed  for  preserving  the  skewness 
and  kurtosis  in  the  parent  nonnormal  distribution, 
are  not  required.  The  small-sample  inaccuracies 
of  estimates  of  these  higher  order  moments  in- 
crease rapidly  and  the  estimated  values  can  easily 
be  in  error  by  several  orders  of  magnitude  (5). 


Conclusions 

Distributions  of  many  hydrologic  variables,  such 
as  floods  and  monthly  streamflows,  can  be  con- 
sidered as  mixed  distributions.  An  objective  method 
—  the  two-distribution  method  — has  been  developed 


Figure  5. —  Mixed  distributions  fitted  to  monthly  streamflows. 
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for  fitting  a  mixed  distribution  having  only  two 
component  distributions.  The  method  has  been 
computerized  for  general  application.  It  uses  an 
iteration  procedure  that  minimizes  the  sum  of 
squared  differences  between  the  observed  and  fitted 
normal  deviates.  The  method  of  moments  for  dis- 
secting a  mixed  distribution  is  beset  by  errors  and 
uncertainties  when  third  and  higher  order  moments 
are  used.  The  observed  distributions  of  annual 
floods  and  monthly  streamflows  are  simulated  satis- 
factorily by  the  mixed  distributions  generated  by 
the  two-distribution  method.  The  component  and 
mixed  distributions  offer  much  promise  in  the  field 
of  sequential  generation.  Mathematically,  the  mixed 
distribution  model  serves  as  a  valuable  tool  in  trans- 
forming and  analyzing  observed  distributions  of 
different  shapes. 
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